pykanto.signal.analysis#

Basic audio feature calculations (spectral centroids, peak frequencies, etc.)

Functions

approximate_minmax_frequency(dataset[, key, ...])

Calculate approximate minimum and maximum frequencies from a mel spectrogram.

get_mean_sd_mfcc(S, n_mfcc)

Extract the mean and SD of n Mel-frequency cepstral coefficients (MFCCs) calculated ffrom a log-power Mel spectrogram.

get_peak_freqs(dataset, spectrograms[, ...])

Return the peak frequencies of each spectrogram in an array of spectrograms.

spec_centroid_bandwidth(dataset[, key, ...])

Calculate a vocalisation's spectral centroid and bandwidth from a mel spectrogram.

pykanto.signal.analysis.get_peak_freqs(dataset: KantoData, spectrograms: np.ndarray, melscale: bool = True, threshold: float = 0.3) np.ndarray[source]#

Return the peak frequencies of each spectrogram in an array of spectrograms.

Parameters
  • dataset (KantoData) – Vocalisation dataset.

  • spectrograms (np.ndarray) – Array of np.ndarray spectrograms.

  • melscale (bool, optional) – Are the spectrograms in the mel scale? Defaults to True.

  • threshold (float, optional) – Threshold for peak detection. Defaults to 0.3.

Returns

Array with peak frequencies.

Return type

np.ndarray

pykanto.signal.analysis.spec_centroid_bandwidth(dataset: KantoData, key: None | str = None, spec: None | np.ndarray = None, plot: bool = False) Tuple[np.ndarray, np.ndarray][source]#

Calculate a vocalisation’s spectral centroid and bandwidth from a mel spectrogram. You can either provide a key string for a vocalisation or its mel spectrogram directly.

Parameters
  • dataset (KantoData) – Dataset object with your data.

  • key (None | str = None) – Key of a vocalisation. Defaults to None.

  • (spec (spec) – None | np.ndarray): Mel spectrogram. Defaults to None.

  • plot (bool, optional) – Whether to show the result. Defaults to False.

Returns

A tuple with the centroids

and bandwidths.

Return type

Tuple[np.ndarray, np.ndarray]

pykanto.signal.analysis.get_mean_sd_mfcc(S: numpy.ndarray, n_mfcc: int) numpy.ndarray[source]#

Extract the mean and SD of n Mel-frequency cepstral coefficients (MFCCs) calculated ffrom a log-power Mel spectrogram.

Parameters
  • S (np.ndarray) – A log-power Mel spectrogram.

  • n_mfcc (int) – Number of coefficients to return.

Returns

Array containing mean and std of each coefficient (len = n_mfcc*2).

Return type

np.ndarray

pykanto.signal.analysis.approximate_minmax_frequency(dataset: KantoData, key: None | str = None, spec: None | np.ndarray = None, roll_percents: list[float] = [0.95, 0.1], plot: bool = False) Tuple[np.ndarray, np.ndarray][source]#

Calculate approximate minimum and maximum frequencies from a mel spectrogram. You can either provide a key string for a vocalisation or its mel spectrogram directly.

Parameters
  • dataset (KantoData) – Dataset object with your data.

  • key (None | str = None) – Key of a vocalisation. Defaults to None.

  • (spec (spec) – None | np.ndarray): Mel spectrogram. Defaults to None.

  • roll_percents (list[float, float], optional) – Percentage of energy contained in bin. Defaults to [0.95, 0.1].

  • plot (bool, optional) – Whether to show the result. Defaults to False.

Returns

A tuple with the approximate minimum and

maximum frequencies, in this order.

Return type

Tuple[np.ndarray, np.ndarray]