pykanto.signal.segment#

Segment audio files and find vocalisation units in spectrograms.

Functions

drop_zero_len_units(dataset, onsets, offsets)

Removes onset/offset pairs which (under this dataset's spectrogram parameter combination) would result in a unit of length zero.

find_units(dataset, spectrogram)

Segment a given spectrogram array into its units.

get_segment_info(RAW_DATA_DIR, min_duration, ...)

Get a summary of all segments present in a directory. Works for .xml files

onsets_offsets(signal)

Labels features in array as insets and offsets.

save_segments(metadata, wavfile, wav_outdir, ...)

Save segments present in a single wav file to new separate files along with their metadata.

segment_file(wav_dir, metadata_dir, ...[, ...])

Segments and saves audio segmets and their metadata from a single audio file, based on annotations provided in a separate 'metadata' file.

segment_files(datapaths, wav_outdir, json_outdir)

Finds and saves audio segments and their metadata.

segment_files_parallel(datapaths, dirs[, ...])

Finds and saves audio segments and their metadata.

segment_is_valid(metadata, max_amplitude, i)

Checks whether a segment of index i within a dictionary is a valid segment.

segment_song_into_units(dataset, key)

Find amplitude-differentiable units in a given vocalisation after applying a series of morphological transformations to reduce noise.

segment_song_into_units_parallel(dataset, ...)

See save_melspectrogram

Classes

ReadWav(wav_dir)

Reads a wav file and its metadata.

SegmentMetadata(metadata, audio_section, i, ...)

Consolidates segment metadata in a single Metadata object, which can then be saved as a standard .JSON file.

class pykanto.signal.segment.ReadWav(wav_dir: pathlib.Path)[source]#

Reads a wav file and its metadata.

Note

You can extend this class to read in metadata from the wav file that is specific to your research, e.g. the recorder device ID, or time information.

Examples

TODO

__init__(wav_dir: pathlib.Path) None[source]#
wav_dir#

Location of wav file.

get_wav() soundfile.SoundFile[source]#

Returns the wavfile.

Returns

Seekable wavfile.

Return type

sf.SoundFile

get_metadata() pykanto.utils.types.AudioAnnotation[source]#

Returns metadata attached to wavfile as an AudioAnnotation object.

Returns

Wavfile metadata.

Return type

AudioAnnotation

as_dict() Dict[str, Any][source]#

Returns metadata attached to wavfile as a dictionary.

Returns

Wavfile metadata.

Return type

Dict[str, Any]

class pykanto.signal.segment.SegmentMetadata(metadata: pykanto.utils.types.Annotation, audio_section: numpy.ndarray, i: int, sr: int, wav_out: pathlib.Path)[source]#

Consolidates segment metadata in a single Metadata object, which can then be saved as a standard .JSON file.

You can extend this class to incorporate other metadata fields specific to your research (see the docs).

__init__(metadata: pykanto.utils.types.Annotation, audio_section: numpy.ndarray, i: int, sr: int, wav_out: pathlib.Path) None[source]#

Consolidates segment metadata in a single Metadata object, which can then be saved as a standard .JSON file.

Parameters
  • name (str) – Segment identifier.

  • metadata (Annotation) – An object containing relevant metadata.

  • audio_section (np.ndarray) – Array containing segment audio data (to extract min/max amplitude).

  • i (int) – Segment index.

  • sr (int) – Sample rate.

  • wav_out (Path) – Path to segment wav file.

Returns: None

all_metadata#

Attribute containing all available metadata

index: int#

Index of ‘focal’ segment

get_metadata() pykanto.utils.types.Metadata[source]#

Get Metadata object.

Returns

Single-segment metadata.

Return type

Metadata

as_dict() Dict[str, Any][source]#

Returns Metadata object as a dictionary.

Returns

Wavfile metadata.

Return type

Dict[str, Any]

pykanto.signal.segment.segment_file(wav_dir: Path, metadata_dir: Path, wav_outdir: Path, json_outdir: Path, resample: int | None = 22050, parser_func: Callable[[Path], SegmentAnnotation] = <function parse_sonic_visualiser_xml>, **kwargs)[source]#

Segments and saves audio segmets and their metadata from a single audio file, based on annotations provided in a separate ‘metadata’ file.

Parameters
  • wav_dir (Path) – Where is the wav file to be segmented?

  • metadata_dir (Path) – Where is the file containing its segmentation metadata?

  • wav_outdir (Path) – Where to save the resulting wav segments.

  • json_outdir (Path) – Where to save the resulting json metadata files.

  • resample (int | None, optional) – Whether to resample audio, and to what sample ratio. Defaults to 22050.

  • parser_func (Callable[[Path], dict[str, Any]], optional) – Function to parse your metadata format. Defaults to parse_sonic_visualiser_xml.

  • **kwargs – Keyword arguments passed to segment_is_valid().

pykanto.signal.segment.save_segments(metadata: Annotation, wavfile: sf.SoundFile, wav_outdir: Path, json_outdir: Path, resample: int | None = 22050, **kwargs) None[source]#

Save segments present in a single wav file to new separate files along with their metadata.

Parameters
  • metadata (Annotation) – Annotation and file metadata for this wav file.

  • wavfile (SoundFile) – Seekable wav file.

  • wav_outdir (Path) – Where to save the resulting segmented wav files.

  • json_outdir (Path) – Where to save the resulting json metadata files.

  • resample (int | None, optional) – Whether to resample audio, and to what sample ratio. Defaults to 22050.

  • **kwargs – Keyword arguments passed to segment_is_valid().

pykanto.signal.segment.segment_is_valid(metadata: pykanto.utils.types.Annotation, max_amplitude: float, i: int, integer_format: str = 'PCM_16', min_duration: float = 0.01, min_freqrange: int = 10, min_amplitude: int = 0, labels_to_ignore: List[str] = ['NO', 'NOISE']) bool[source]#

Checks whether a segment of index i within a dictionary is a valid segment.

Parameters
  • metadata (Annotation) – Annotation object for a wav file.

  • i (int) – Segment index.

  • min_duration (float, optional) – Minimum duration of segment to consider valid, in seconds. Defaults to 0.01.

  • min_freqrange (int, optional) – Minimum frequency range of segment to consider valid, in Hertzs. Defaults to 10.

  • labels_to_ignore (List[str], optional) – Exclude any segments with these labels. Defaults to [“NO”, “NOISE”].

Returns

Is this a valid segment?

Return type

bool

pykanto.signal.segment.segment_files(datapaths: List[Tuple[Path, Path]], wav_outdir: Path, json_outdir: Path, resample: int | None = 22050, parser_func: Callable[[Path], SegmentAnnotation] = <function parse_sonic_visualiser_xml>, pbar: bool = True, **kwargs) None[source]#

Finds and saves audio segments and their metadata. Parallel version in segment_files_parallel(). Works well with large files (only reads one chunk at a time).

Parameters
  • datapaths (List[Tuple[Path, Path]]) – List of tuples with pairs of paths to raw data files and their annotation metadata files.

  • wav_outdir (Path) – Location where to save generated wav files.

  • json_outdir (Path) – Location where to save generated json metadata files.

  • resample (int | None, optional) – Whether to resample audio. Defaults to 22050.

  • parser_func (Callable[[Path], dict[str, Any]], optional) – Function to parse your metadata format. Defaults to parse_sonic_visualiser_xml.

  • pbar (bool, optional) – Wheter to print progress bar. Defaults to True.

  • **kwargs – Keyword arguments passed to segment_is_valid()

pykanto.signal.segment.segment_files_parallel(datapaths: List[Tuple[Path, Path]], dirs: ProjDirs, resample: int | None = 22050, parser_func: Callable[[Path], SegmentAnnotation] = <function parse_sonic_visualiser_xml>, num_cpus: float | None = None, verbose: bool = True, **kwargs) None[source]#

Finds and saves audio segments and their metadata. Parallel version of segment_files(). Works well with large files (only reads one chunk at a time).

Note

Creates [“WAV”, “JSON”] output subfolders in data/segmented/dataset.

Parameters
  • datapaths (List[Tuple[Path, Path]]) – List of tuples with pairs of paths to raw data files and their annotation metadata files.

  • dirs (ProjDirs) – Project directory structure.

  • resample (int | None, optional) – Whether to resample audio. Defaults to 22050.

  • parser_func (Callable[[Path], SegmentAnnotation], optional) – Function to parse your metadata format. Defaults to parse_sonic_visualiser_xml.

  • num_cpus (float | None, optional) – Number of cpus to use for parallel computing. Defaults to None (all available).

  • verbose (bool, optional) – Defaults to True

  • **kwargs – Keyword arguments passed to segment_is_valid()

pykanto.signal.segment.get_segment_info(RAW_DATA_DIR: pathlib.Path, min_duration: float, min_freqrange: int, ignore_labels: List[str] = ['FIRST', 'first']) Dict[str, List[float]][source]#
Get a summary of all segments present in a directory. Works for .xml files

output by Sonic Visualiser.

Parameters
  • RAW_DATA_DIR (Path) – Folder to check, normally DATA_DIR / “raw” / YEAR

  • min_duration (float) – Minimum duration for a segment to be considered (in seconds)

  • min_freqrange (int) – Minimum frequency range for a segment to be considered (in hertz)

  • ignore_labels (List[str], optional) – Ignore segments with these labels. Defaults to [“FIRST”, “first”].

Returns

Lists of segment durations, in seconds

Return type

Dict[str, List[float]]

pykanto.signal.segment.find_units(dataset: KantoData, spectrogram: np.ndarray) Tuple[np.ndarray, np.ndarray] | tuple[None, None][source]#

Segment a given spectrogram array into its units. For convenience, parameters are defined in a KantoData class instance (class Parameters). Based on Tim Sainburg’s vocalseg code.

Returns

Tuple(onsets, offsets) None: Did not find any units matching given criteria.

Return type

Tuple[np.ndarray, np.ndarray]

pykanto.signal.segment.onsets_offsets(signal: numpy.ndarray) numpy.ndarray[source]#

Labels features in array as insets and offsets. Based on Tim Sainburg’s vocalseg.

Parameters

signal (np.ndarray) – _description_

Returns

_description_

Return type

np.ndarray

pykanto.signal.segment.segment_song_into_units(dataset: KantoData, key: str) Tuple[str, np.ndarray, np.ndarray] | None[source]#

Find amplitude-differentiable units in a given vocalisation after applying a series of morphological transformations to reduce noise.

Parameters
  • dataset (KantoData) – Datset to use.

  • key (str) – _description_

Returns

_description_

Return type

Tuple[str, np.ndarray, np.ndarray] | None

pykanto.signal.segment.segment_song_into_units_parallel(dataset: KantoData, keys: Iterable[str], **kwargs) List[Tuple[str, np.ndarray, np.ndarray]][source]#

See save_melspectrogram

pykanto.signal.segment.drop_zero_len_units(dataset: KantoData, onsets: np.ndarray, offsets: np.ndarray) Tuple[np.ndarray, np.ndarray][source]#

Removes onset/offset pairs which (under this dataset’s spectrogram parameter combination) would result in a unit of length zero.

Parameters
  • dataset (KantoData) – KantoData instance containing parameters.

  • onsets (np.ndarray) – In seconds

  • offsets (np.ndarray) – In seconds

Returns

Remaining onsets and offsets

Return type

Tuple[np.ndarray, np.ndarray]