Preparing long recordings#

‘Long recording segmentation’ here refers to the extraction of regions of interest from long, noisy raw recordings, along with any relevant metadata. Pykanto is agnostic as to how you find those segments; they will usually contain entire songs or calls that you want to analyse in more detail.

For this guide I have used a friendly application, sonic visualiser, to manually draw boxes around individual regions of interest, and store time and frequency information in .xml files. To read these, I provide pykanto with a custom parser, called parse_sonic_visualiser_xml.

This kind of manual annotation can be time-consuming, You can use pykanto to, for example, create a training dataset for a deep learning model, and then use segmenting information predicted by that model to create a larger dataset in a more automated way

If you have annotation files that are formatted differently, you can either transform them into the format used here, or write your own parser—it just needs to return a SegmentAnnotation object. You can find examples of the .xml file format in the /data folder installed with the package.

Segmenting files using existing `.xml` metadata files.#

This requires folder(s) of audio files containing .xml files with onset, offset and frequency information for each segment of interest.

# Change the below to your own data directory and dataset name
dataset_name = "BENGALESE_FINCH"
data_dir = Path(pkg_resources.resource_filename("pykanto", "data")) 
project_root = Path(data_dir).parent
raw_data = data_dir / "raw" / dataset_name

DIRS = ProjDirs(project_root, raw_data, dataset_name, mkdir=True)

# Find files and their metadata (assumed to be in the same directory)
wav_filepaths, xml_filepaths = [
    get_file_paths(DIRS.RAW_DATA, [ext]) for ext in [".wav", ".xml"]
]
files_to_segment = get_wavs_w_annotation(wav_filepaths, xml_filepaths)

# Segment all files, ignoring "NOISE" labels and segments shorter than 0.5
# seconds or with a frequency range smaller than 200 Hz
segment_files_parallel(
    files_to_segment,
    DIRS,
    resample=None,
    parser_func=parse_sonic_visualiser_xml,
    min_duration=0.5,
    min_freqrange=200,
    labels_to_ignore=["NOISE"],
)

And you are ready to start analysing your data!

Segmenting files with custom metadata fields#

Let’s say you are using AudioMoth recorders and want to retrieve some non-standard metadata from its audio files: (1) the device ID, and (2) the date and time time of an audio segment.

Here’s how you can do it:

First, to make it easier to see what fields are available you can create a ReadWav object from a file and print its metadata, like so:

# Loads a sample AudioMoth file, included with pykanto
DIRS = pykanto_data(dataset="AM")

wav_dirs = get_file_paths(DIRS.RAW_DATA, extensions=['.WAV'])
meta = ReadWav(wav_dirs[0]).all_metadata
print(meta)

Now let’s acess the metadata of interest and tell pykanto that we want to add these to the .JSON files and, later, to our database.

First, add any new attributes, along with their data type annotations and any validators to the Annotation class. This will make sure that your new attributes, or fields, are properly parsed.

@attr.s
class CustomAnnotation(Annotation):
    rec_unit: str = attr.ib(validator=validators.instance_of(str))
    # This is intended as a short example, but in reality you could make sure that
    # this string can be parsed as a datetime object.
    datetime: str = attr.ib(validator=validators.instance_of(str))

Annotation.__init__ = CustomAnnotation.__init__

Then, monkey-patch the get_metadata methods of the ReadWav and SegmentMetadata classes to add any extra fields that your project might require. This will save you from having to define the full classes and their methods again from scratch. Some people would say this is ugly, and I’d tend to agree, but it is the most concise way of doing this that I could think of that still preserves full flexibility.

def ReadWav_patch(self) -> Dict[str, Any]:
    comment = self.all_metadata['tags'].comment[0]
    add_to_dict = {
        'rec_unit': str(re.search(r"AudioMoth.(.*?) at gain", comment).group(1)),
        'datetime': str(parse(re.search(r"at.(.*?) \(UTC\)", comment).group(1)))
    }
    return {**self.metadata.__dict__, **add_to_dict}


def SegmentMetadata_patch(self) -> Dict[str, Any]:
    start = self.all_metadata.start_times[self.index] / self.all_metadata.sample_rate
    datetime = parse(self.all_metadata.datetime) + dt.timedelta(seconds=start)
    add_to_dict = {
        'rec_unit': self.all_metadata.rec_unit,
        'datetime': str(datetime),
    }
    return {**self.metadata.__dict__, **add_to_dict}


ReadWav.get_metadata = ReadWav_patch
SegmentMetadata.get_metadata = SegmentMetadata_patch

Now you can segment your annotated files like you would normally do - their metadata will contain your custom fields.

wav_filepaths, xml_filepaths = [get_file_paths(
    DIRS.RAW_DATA, [ext]) for ext in ['.WAV', '.xml']]
files_to_segment = get_wavs_w_annotation(wav_filepaths, xml_filepaths)

wav_outdir, json_outdir = [makedir(DIRS.SEGMENTED / ext)
                           for ext in ["WAV", "JSON"]]

segment_files(
    files_to_segment,
    wav_outdir,
    json_outdir,
    parser_func=parse_sonic_visualiser_xml
)

Note: if you want to run this in paralell with ray (as in segment_files_parallel) monkey-patching will not work: for now, you will have to properly extend ReadWav and SegmentMetadata.

Preparing long recordings

Contents

Preparing long recordings#

Segmenting files using existing .xml metadata files.#

Segmenting files with custom metadata fields#

Segmenting files using existing `.xml` metadata files.#