pykanto.signal.analysis
pykanto.signal.analysis#
Basic audio feature calculations (spectral centroids, peak frequencies, etc.)
Functions
|
Calculate approximate minimum and maximum frequencies from a mel spectrogram. |
|
Extract the mean and SD of n Mel-frequency cepstral coefficients (MFCCs) calculated ffrom a log-power Mel spectrogram. |
|
Return the peak frequencies of each spectrogram in an array of spectrograms. |
|
Calculate a vocalisation's spectral centroid and bandwidth from a mel spectrogram. |
- pykanto.signal.analysis.get_peak_freqs(dataset: KantoData, spectrograms: np.ndarray, melscale: bool = True, threshold: float = 0.3) np.ndarray [source]#
Return the peak frequencies of each spectrogram in an array of spectrograms.
- Parameters
dataset (KantoData) – Vocalisation dataset.
spectrograms (np.ndarray) – Array of np.ndarray spectrograms.
melscale (bool, optional) – Are the spectrograms in the mel scale? Defaults to True.
threshold (float, optional) – Threshold for peak detection. Defaults to 0.3.
- Returns
Array with peak frequencies.
- Return type
np.ndarray
- pykanto.signal.analysis.spec_centroid_bandwidth(dataset: KantoData, key: None | str = None, spec: None | np.ndarray = None, plot: bool = False) Tuple[np.ndarray, np.ndarray] [source]#
Calculate a vocalisation’s spectral centroid and bandwidth from a mel spectrogram. You can either provide a key string for a vocalisation or its mel spectrogram directly.
- Parameters
dataset (KantoData) – Dataset object with your data.
key (None | str = None) – Key of a vocalisation. Defaults to None.
(spec (spec) – None | np.ndarray): Mel spectrogram. Defaults to None.
plot (bool, optional) – Whether to show the result. Defaults to False.
- Returns
- A tuple with the centroids
and bandwidths.
- Return type
Tuple[np.ndarray, np.ndarray]
- pykanto.signal.analysis.get_mean_sd_mfcc(S: numpy.ndarray, n_mfcc: int) numpy.ndarray [source]#
Extract the mean and SD of n Mel-frequency cepstral coefficients (MFCCs) calculated ffrom a log-power Mel spectrogram.
- Parameters
S (np.ndarray) – A log-power Mel spectrogram.
n_mfcc (int) – Number of coefficients to return.
- Returns
Array containing mean and std of each coefficient (len = n_mfcc*2).
- Return type
np.ndarray
- pykanto.signal.analysis.approximate_minmax_frequency(dataset: KantoData, key: None | str = None, spec: None | np.ndarray = None, roll_percents: list[float] = [0.95, 0.1], plot: bool = False) Tuple[np.ndarray, np.ndarray] [source]#
Calculate approximate minimum and maximum frequencies from a mel spectrogram. You can either provide a key string for a vocalisation or its mel spectrogram directly.
- Parameters
dataset (KantoData) – Dataset object with your data.
key (None | str = None) – Key of a vocalisation. Defaults to None.
(spec (spec) – None | np.ndarray): Mel spectrogram. Defaults to None.
roll_percents (list[float, float], optional) – Percentage of energy contained in bin. Defaults to [0.95, 0.1].
plot (bool, optional) – Whether to show the result. Defaults to False.
- Returns
- A tuple with the approximate minimum and
maximum frequencies, in this order.
- Return type
Tuple[np.ndarray, np.ndarray]