pykanto.signal.segment
pykanto.signal.segment#
Segment audio files and find vocalisation units in spectrograms.
Functions
|
Removes onset/offset pairs which (under this dataset's spectrogram parameter combination) would result in a unit of length zero. |
|
Segment a given spectrogram array into its units. |
|
Get a summary of all segments present in a directory. Works for .xml files |
|
Labels features in array as insets and offsets. |
|
Save segments present in a single wav file to new separate files along with their metadata. |
|
Segments and saves audio segmets and their metadata from a single audio file, based on annotations provided in a separate 'metadata' file. |
|
Finds and saves audio segments and their metadata. |
|
Finds and saves audio segments and their metadata. |
|
Checks whether a segment of index i within a dictionary is a valid segment. |
|
Find amplitude-differentiable units in a given vocalisation after applying a series of morphological transformations to reduce noise. |
|
See save_melspectrogram |
Classes
|
Reads a wav file and its metadata. |
|
Consolidates segment metadata in a single Metadata object, which can then be saved as a standard .JSON file. |
- class pykanto.signal.segment.ReadWav(wav_dir: pathlib.Path)[source]#
Reads a wav file and its metadata.
Note
You can extend this class to read in metadata from the wav file that is specific to your research, e.g. the recorder device ID, or time information.
Examples
TODO
- wav_dir#
Location of wav file.
- get_wav() soundfile.SoundFile [source]#
Returns the wavfile.
- Returns
Seekable wavfile.
- Return type
sf.SoundFile
- get_metadata() pykanto.utils.types.AudioAnnotation [source]#
Returns metadata attached to wavfile as an AudioAnnotation object.
- Returns
Wavfile metadata.
- Return type
- class pykanto.signal.segment.SegmentMetadata(metadata: pykanto.utils.types.Annotation, audio_section: numpy.ndarray, i: int, sr: int, wav_out: pathlib.Path)[source]#
Consolidates segment metadata in a single Metadata object, which can then be saved as a standard .JSON file.
You can extend this class to incorporate other metadata fields specific to your research (see the docs).
- __init__(metadata: pykanto.utils.types.Annotation, audio_section: numpy.ndarray, i: int, sr: int, wav_out: pathlib.Path) None [source]#
Consolidates segment metadata in a single Metadata object, which can then be saved as a standard .JSON file.
- Parameters
name (str) – Segment identifier.
metadata (Annotation) – An object containing relevant metadata.
audio_section (np.ndarray) – Array containing segment audio data (to extract min/max amplitude).
i (int) – Segment index.
sr (int) – Sample rate.
wav_out (Path) – Path to segment wav file.
Returns: None
- all_metadata#
Attribute containing all available metadata
- index: int#
Index of ‘focal’ segment
- get_metadata() pykanto.utils.types.Metadata [source]#
Get Metadata object.
- Returns
Single-segment metadata.
- Return type
- pykanto.signal.segment.segment_file(wav_dir: Path, metadata_dir: Path, wav_outdir: Path, json_outdir: Path, resample: int | None = 22050, parser_func: Callable[[Path], SegmentAnnotation] = <function parse_sonic_visualiser_xml>, **kwargs)[source]#
Segments and saves audio segmets and their metadata from a single audio file, based on annotations provided in a separate ‘metadata’ file.
- Parameters
wav_dir (Path) – Where is the wav file to be segmented?
metadata_dir (Path) – Where is the file containing its segmentation metadata?
wav_outdir (Path) – Where to save the resulting wav segments.
json_outdir (Path) – Where to save the resulting json metadata files.
resample (int | None, optional) – Whether to resample audio, and to what sample ratio. Defaults to 22050.
parser_func (Callable[[Path], dict[str, Any]], optional) – Function to parse your metadata format. Defaults to parse_sonic_visualiser_xml.
**kwargs – Keyword arguments passed to
segment_is_valid()
.
- pykanto.signal.segment.save_segments(metadata: Annotation, wavfile: sf.SoundFile, wav_outdir: Path, json_outdir: Path, resample: int | None = 22050, **kwargs) None [source]#
Save segments present in a single wav file to new separate files along with their metadata.
- Parameters
metadata (Annotation) – Annotation and file metadata for this wav file.
wavfile (SoundFile) – Seekable wav file.
wav_outdir (Path) – Where to save the resulting segmented wav files.
json_outdir (Path) – Where to save the resulting json metadata files.
resample (int | None, optional) – Whether to resample audio, and to what sample ratio. Defaults to 22050.
**kwargs – Keyword arguments passed to
segment_is_valid()
.
- pykanto.signal.segment.segment_is_valid(metadata: pykanto.utils.types.Annotation, max_amplitude: float, i: int, integer_format: str = 'PCM_16', min_duration: float = 0.01, min_freqrange: int = 10, min_amplitude: int = 0, labels_to_ignore: List[str] = ['NO', 'NOISE']) bool [source]#
Checks whether a segment of index i within a dictionary is a valid segment.
- Parameters
metadata (Annotation) – Annotation object for a wav file.
i (int) – Segment index.
min_duration (float, optional) – Minimum duration of segment to consider valid, in seconds. Defaults to 0.01.
min_freqrange (int, optional) – Minimum frequency range of segment to consider valid, in Hertzs. Defaults to 10.
labels_to_ignore (List[str], optional) – Exclude any segments with these labels. Defaults to [“NO”, “NOISE”].
- Returns
Is this a valid segment?
- Return type
bool
- pykanto.signal.segment.segment_files(datapaths: List[Tuple[Path, Path]], wav_outdir: Path, json_outdir: Path, resample: int | None = 22050, parser_func: Callable[[Path], SegmentAnnotation] = <function parse_sonic_visualiser_xml>, pbar: bool = True, **kwargs) None [source]#
Finds and saves audio segments and their metadata. Parallel version in
segment_files_parallel()
. Works well with large files (only reads one chunk at a time).- Parameters
datapaths (List[Tuple[Path, Path]]) – List of tuples with pairs of paths to raw data files and their annotation metadata files.
wav_outdir (Path) – Location where to save generated wav files.
json_outdir (Path) – Location where to save generated json metadata files.
resample (int | None, optional) – Whether to resample audio. Defaults to 22050.
parser_func (Callable[[Path], dict[str, Any]], optional) – Function to parse your metadata format. Defaults to parse_sonic_visualiser_xml.
pbar (bool, optional) – Wheter to print progress bar. Defaults to True.
**kwargs – Keyword arguments passed to
segment_is_valid()
- pykanto.signal.segment.segment_files_parallel(datapaths: List[Tuple[Path, Path]], dirs: ProjDirs, resample: int | None = 22050, parser_func: Callable[[Path], SegmentAnnotation] = <function parse_sonic_visualiser_xml>, num_cpus: float | None = None, verbose: bool = True, **kwargs) None [source]#
Finds and saves audio segments and their metadata. Parallel version of
segment_files()
. Works well with large files (only reads one chunk at a time).Note
Creates [“WAV”, “JSON”] output subfolders in data/segmented/dataset.
- Parameters
datapaths (List[Tuple[Path, Path]]) – List of tuples with pairs of paths to raw data files and their annotation metadata files.
dirs (ProjDirs) – Project directory structure.
resample (int | None, optional) – Whether to resample audio. Defaults to 22050.
parser_func (Callable[[Path], SegmentAnnotation], optional) – Function to parse your metadata format. Defaults to parse_sonic_visualiser_xml.
num_cpus (float | None, optional) – Number of cpus to use for parallel computing. Defaults to None (all available).
verbose (bool, optional) – Defaults to True
**kwargs – Keyword arguments passed to
segment_is_valid()
- pykanto.signal.segment.get_segment_info(RAW_DATA_DIR: pathlib.Path, min_duration: float, min_freqrange: int, ignore_labels: List[str] = ['FIRST', 'first']) Dict[str, List[float]] [source]#
- Get a summary of all segments present in a directory. Works for .xml files
output by Sonic Visualiser.
- Parameters
RAW_DATA_DIR (Path) – Folder to check, normally DATA_DIR / “raw” / YEAR
min_duration (float) – Minimum duration for a segment to be considered (in seconds)
min_freqrange (int) – Minimum frequency range for a segment to be considered (in hertz)
ignore_labels (List[str], optional) – Ignore segments with these labels. Defaults to [“FIRST”, “first”].
- Returns
Lists of segment durations, in seconds
- Return type
Dict[str, List[float]]
- pykanto.signal.segment.find_units(dataset: KantoData, spectrogram: np.ndarray) Tuple[np.ndarray, np.ndarray] | tuple[None, None] [source]#
Segment a given spectrogram array into its units. For convenience, parameters are defined in a KantoData class instance (class Parameters). Based on Tim Sainburg’s vocalseg code.
- Returns
Tuple(onsets, offsets) None: Did not find any units matching given criteria.
- Return type
Tuple[np.ndarray, np.ndarray]
- pykanto.signal.segment.onsets_offsets(signal: numpy.ndarray) numpy.ndarray [source]#
Labels features in array as insets and offsets. Based on Tim Sainburg’s vocalseg.
- Parameters
signal (np.ndarray) – _description_
- Returns
_description_
- Return type
np.ndarray
- pykanto.signal.segment.segment_song_into_units(dataset: KantoData, key: str) Tuple[str, np.ndarray, np.ndarray] | None [source]#
Find amplitude-differentiable units in a given vocalisation after applying a series of morphological transformations to reduce noise.
- Parameters
dataset (KantoData) – Datset to use.
key (str) – _description_
- Returns
_description_
- Return type
Tuple[str, np.ndarray, np.ndarray] | None
- pykanto.signal.segment.segment_song_into_units_parallel(dataset: KantoData, keys: Iterable[str], **kwargs) List[Tuple[str, np.ndarray, np.ndarray]] [source]#
See save_melspectrogram
- pykanto.signal.segment.drop_zero_len_units(dataset: KantoData, onsets: np.ndarray, offsets: np.ndarray) Tuple[np.ndarray, np.ndarray] [source]#
Removes onset/offset pairs which (under this dataset’s spectrogram parameter combination) would result in a unit of length zero.
- Parameters
dataset (KantoData) – KantoData instance containing parameters.
onsets (np.ndarray) – In seconds
offsets (np.ndarray) – In seconds
- Returns
Remaining onsets and offsets
- Return type
Tuple[np.ndarray, np.ndarray]