Song Data #

The songs directory contains audio recordings, along with metadata for each recording. The dataset is split into four zip files:

songs
├── song-files.zip.part1
├── song-files.zip.part2
├── song-files.zip.part3
└── song-files.zip.part4

These files can be stitched together before unzipping using the command cat song-files.zip.part* > sf.zip && zip -FF sf.zip --out song-files.zip, or your preferred method if you are not on a Unix-like system.

When you unzip the file, you will see the following structure:

song-files
├── JSON
│   ├── file.JSON
│   └── ...
└── WAV
    ├── 1_20190501_000000_0.wav
    └── ...

Dataset Size #

There are 109,963 files in each of the WAV and JSON folders. The total size of the dataset is 11.4 GB.

File Format #

The dataset is provided in ZIP format, with two folders: WAV and JSON. The WAV folder contains the audio recordings in WAV format, and the JSON folder contains the metadata for each recording in JSON format. The filenames match between the two folders-only the extension changes.

The WAV files are mono 16-bit PCM audio files with a sample rate of 22,050 Hz. The files are named according to the following convention: <ID>_<YYYYMMDD_HHMMSS>_<start frame>.wav Where ID is the unique identifier for the recording, YYYYMMDD_HHMMSS is the datetime of the recording, and start frame is the start frame of the recording in the original audio file.

The JSON files contain metadata for each recording. The files are named according to the same convention, <ID>_<YYYYMMDD_HHMMSS>_<start frame>.JSON, and each file contains the following fields:

Key Description
sample_rate The sample rate of the audio
bit_rate The bit rate of the audio
length_s The duration of the audio segment
ID The unique identifier for the audio segment
label The label associated with the audio segment
start The start position of the audio segment
end The end position of the audio segment
lower_freq The lower frequency bound of the audio segment
upper_freq The upper frequency bound of the audio segment
max_amplitude The maximum amplitude of the audio segment
min_amplitude The minimum amplitude of the audio segment
source_wav The file path to the original source WAV file
annotation_file The file path to the XML annotation file
wav_file The file path to the segmented WAV file
class_id The class identifier for the audio segment
datetime The date and time of the recording
onsets A list of onset times in seconds
offsets A list of offset times in seconds
silence_durations A list of silence durations in seconds
unit_durations A list of unit durations in seconds