pykanto.signal.segment#

Segment audio files and find vocalisation units in spectrograms.

Functions

`drop_zero_len_units`(dataset, onsets, offsets)	Removes onset/offset pairs which (under this dataset's spectrogram parameter combination) would result in a unit of length zero.
`find_units`(dataset, spectrogram)	Segment a given spectrogram array into its units.
`get_segment_info`(RAW_DATA_DIR, min_duration, ...)	Get a summary of all segments present in a directory. Works for .xml files
`onsets_offsets`(signal)	Labels features in array as insets and offsets.
`save_segments`(metadata, wavfile, wav_outdir, ...)	Save segments present in a single wav file to new separate files along with their metadata.
`segment_file`(wav_dir, metadata_dir, ...[, ...])	Segments and saves audio segmets and their metadata from a single audio file, based on annotations provided in a separate 'metadata' file.
`segment_files`(datapaths, wav_outdir, json_outdir)	Finds and saves audio segments and their metadata.
`segment_files_parallel`(datapaths, dirs[, ...])	Finds and saves audio segments and their metadata.
`segment_is_valid`(metadata, max_amplitude, i)	Checks whether a segment of index i within a dictionary is a valid segment.
`segment_song_into_units`(dataset, key)	Find amplitude-differentiable units in a given vocalisation after applying a series of morphological transformations to reduce noise.
`segment_song_into_units_parallel`(dataset, ...)	See save_melspectrogram

Classes

`ReadWav`(wav_dir)	Reads a wav file and its metadata.
`SegmentMetadata`(metadata, audio_section, i, ...)	Consolidates segment metadata in a single Metadata object, which can then be saved as a standard .JSON file.

class pykanto.signal.segment.ReadWav(wav_dir: pathlib.Path)[source]#

Reads a wav file and its metadata.

Note

You can extend this class to read in metadata from the wav file that is specific to your research, e.g. the recorder device ID, or time information.

Examples

TODO

__init__(wav_dir: pathlib.Path) → None[source]#

wav_dir#: Location of wav file.

get_wav() → soundfile.SoundFile[source]#

Returns the wavfile.

Returns: Seekable wavfile.
Return type: sf.SoundFile

get_metadata() → pykanto.utils.types.AudioAnnotation[source]#

Returns metadata attached to wavfile as an AudioAnnotation object.

Returns: Wavfile metadata.
Return type: AudioAnnotation

as_dict() → Dict[str, Any][source]#

Returns metadata attached to wavfile as a dictionary.

Returns: Wavfile metadata.
Return type: Dict[str, Any]

class pykanto.signal.segment.SegmentMetadata(metadata: pykanto.utils.types.Annotation, audio_section: numpy.ndarray, i: int, sr: int, wav_out: pathlib.Path)[source]#

Consolidates segment metadata in a single Metadata object, which can then be saved as a standard .JSON file.

You can extend this class to incorporate other metadata fields specific to your research (see the docs).

__init__(metadata: pykanto.utils.types.Annotation, audio_section: numpy.ndarray, i: int, sr: int, wav_out: pathlib.Path) → None[source]#

Consolidates segment metadata in a single Metadata object, which can then be saved as a standard .JSON file.

Parameters

name (str) – Segment identifier.
metadata (Annotation) – An object containing relevant metadata.
audio_section (np.ndarray) – Array containing segment audio data (to extract min/max amplitude).
i (int) – Segment index.
sr (int) – Sample rate.
wav_out (Path) – Path to segment wav file.

Returns: None

all_metadata#: Attribute containing all available metadata

index: int#: Index of ‘focal’ segment

get_metadata() → pykanto.utils.types.Metadata[source]#

Get Metadata object.

Returns: Single-segment metadata.
Return type: Metadata

as_dict() → Dict[str, Any][source]#

Returns Metadata object as a dictionary.

Returns: Wavfile metadata.
Return type: Dict[str, Any]

pykanto.signal.segment.segment_file(wav_dir: Path, metadata_dir: Path, wav_outdir: Path, json_outdir: Path, resample: int | None = 22050, parser_func: Callable[[Path], SegmentAnnotation] = <function parse_sonic_visualiser_xml>, **kwargs)[source]#

Segments and saves audio segmets and their metadata from a single audio file, based on annotations provided in a separate ‘metadata’ file.

Parameters

wav_dir (Path) – Where is the wav file to be segmented?
metadata_dir (Path) – Where is the file containing its segmentation metadata?
wav_outdir (Path) – Where to save the resulting wav segments.
json_outdir (Path) – Where to save the resulting json metadata files.
resample (int | None, optional) – Whether to resample audio, and to what sample ratio. Defaults to 22050.
parser_func (Callable[[Path], dict[str, Any]], optional) – Function to parse your metadata format. Defaults to parse_sonic_visualiser_xml.
**kwargs – Keyword arguments passed to segment_is_valid().

pykanto.signal.segment.save_segments(metadata: Annotation, wavfile: sf.SoundFile, wav_outdir: Path, json_outdir: Path, resample: int | None = 22050, **kwargs) → None[source]#

Save segments present in a single wav file to new separate files along with their metadata.

Parameters

metadata (Annotation) – Annotation and file metadata for this wav file.
wavfile (SoundFile) – Seekable wav file.
wav_outdir (Path) – Where to save the resulting segmented wav files.
json_outdir (Path) – Where to save the resulting json metadata files.
resample (int | None, optional) – Whether to resample audio, and to what sample ratio. Defaults to 22050.
**kwargs – Keyword arguments passed to segment_is_valid().

pykanto.signal.segment.segment_is_valid(metadata: pykanto.utils.types.Annotation, max_amplitude: float, i: int, integer_format: str = 'PCM_16', min_duration: float = 0.01, min_freqrange: int = 10, min_amplitude: int = 0, labels_to_ignore: List[str] = ['NO', 'NOISE']) → bool[source]#

Checks whether a segment of index i within a dictionary is a valid segment.

Parameters

metadata (Annotation) – Annotation object for a wav file.
i (int) – Segment index.
min_duration (float, optional) – Minimum duration of segment to consider valid, in seconds. Defaults to 0.01.
min_freqrange (int, optional) – Minimum frequency range of segment to consider valid, in Hertzs. Defaults to 10.
labels_to_ignore (List[str], optional) – Exclude any segments with these labels. Defaults to [“NO”, “NOISE”].

Returns

Is this a valid segment?

Return type

bool

pykanto.signal.segment.segment_files(datapaths: List[Tuple[Path, Path]], wav_outdir: Path, json_outdir: Path, resample: int | None = 22050, parser_func: Callable[[Path], SegmentAnnotation] = <function parse_sonic_visualiser_xml>, pbar: bool = True, **kwargs) → None[source]#

Finds and saves audio segments and their metadata. Parallel version in segment_files_parallel(). Works well with large files (only reads one chunk at a time).

Parameters

datapaths (List[Tuple[Path, Path]]) – List of tuples with pairs of paths to raw data files and their annotation metadata files.
wav_outdir (Path) – Location where to save generated wav files.
json_outdir (Path) – Location where to save generated json metadata files.
resample (int | None, optional) – Whether to resample audio. Defaults to 22050.
parser_func (Callable[[Path], dict[str, Any]], optional) – Function to parse your metadata format. Defaults to parse_sonic_visualiser_xml.
pbar (bool, optional) – Wheter to print progress bar. Defaults to True.
**kwargs – Keyword arguments passed to segment_is_valid()

pykanto.signal.segment.segment_files_parallel(datapaths: List[Tuple[Path, Path]], dirs: ProjDirs, resample: int | None = 22050, parser_func: Callable[[Path], SegmentAnnotation] = <function parse_sonic_visualiser_xml>, num_cpus: float | None = None, verbose: bool = True, **kwargs) → None[source]#

Finds and saves audio segments and their metadata. Parallel version of segment_files(). Works well with large files (only reads one chunk at a time).

Note

Creates [“WAV”, “JSON”] output subfolders in data/segmented/dataset.

Parameters

datapaths (List[Tuple[Path, Path]]) – List of tuples with pairs of paths to raw data files and their annotation metadata files.
dirs (ProjDirs) – Project directory structure.
resample (int | None, optional) – Whether to resample audio. Defaults to 22050.
parser_func (Callable[[Path], SegmentAnnotation], optional) – Function to parse your metadata format. Defaults to parse_sonic_visualiser_xml.
num_cpus (float | None, optional) – Number of cpus to use for parallel computing. Defaults to None (all available).
verbose (bool, optional) – Defaults to True
**kwargs – Keyword arguments passed to segment_is_valid()

pykanto.signal.segment.get_segment_info(RAW_DATA_DIR: pathlib.Path, min_duration: float, min_freqrange: int, ignore_labels: List[str] = ['FIRST', 'first']) → Dict[str, List[float]][source]#

Get a summary of all segments present in a directory. Works for .xml files: output by Sonic Visualiser.

Parameters

RAW_DATA_DIR (Path) – Folder to check, normally DATA_DIR / “raw” / YEAR
min_duration (float) – Minimum duration for a segment to be considered (in seconds)
min_freqrange (int) – Minimum frequency range for a segment to be considered (in hertz)
ignore_labels (List[str], optional) – Ignore segments with these labels. Defaults to [“FIRST”, “first”].

Returns

Lists of segment durations, in seconds

Return type

Dict[str, List[float]]

pykanto.signal.segment.find_units(dataset: KantoData, spectrogram: np.ndarray) → Tuple[np.ndarray, np.ndarray] | tuple[None, None][source]#

Segment a given spectrogram array into its units. For convenience, parameters are defined in a KantoData class instance (class Parameters). Based on Tim Sainburg’s vocalseg code.

Returns: Tuple(onsets, offsets) None: Did not find any units matching given criteria.
Return type: Tuple[np.ndarray, np.ndarray]

pykanto.signal.segment.onsets_offsets(signal: numpy.ndarray) → numpy.ndarray[source]#

Labels features in array as insets and offsets. Based on Tim Sainburg’s vocalseg.

Parameters: signal (np.ndarray) – _description_
Returns: _description_
Return type: np.ndarray

pykanto.signal.segment.segment_song_into_units(dataset: KantoData, key: str) → Tuple[str, np.ndarray, np.ndarray] | None[source]#

Find amplitude-differentiable units in a given vocalisation after applying a series of morphological transformations to reduce noise.

Parameters

dataset (KantoData) – Datset to use.
key (str) – _description_

Returns

_description_

Return type

Tuple[str, np.ndarray, np.ndarray] | None

pykanto.signal.segment.segment_song_into_units_parallel(dataset: KantoData, keys: Iterable[str], **kwargs) → List[Tuple[str, np.ndarray, np.ndarray]][source]#: See save_melspectrogram

pykanto.signal.segment.drop_zero_len_units(dataset: KantoData, onsets: np.ndarray, offsets: np.ndarray) → Tuple[np.ndarray, np.ndarray][source]#

Removes onset/offset pairs which (under this dataset’s spectrogram parameter combination) would result in a unit of length zero.

Parameters

dataset (KantoData) – KantoData instance containing parameters.
onsets (np.ndarray) – In seconds
offsets (np.ndarray) – In seconds

Returns

Remaining onsets and offsets

Return type

Tuple[np.ndarray, np.ndarray]