{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Basic workflow" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We are going to load one of the very small datasets that are packaged with\n", "`pykanto`—this will be enough to check that everything is working as it should\n", "and to familiarise yourself with the package. See [project\n", "setup](./project-setup.md) to learn how to load your own data.\n", "\n", "```{admonition} Note:\n", ":class: note\n", "\n", "Creating a `KantoData` dataset requires that you have already set up your project directories (see [project setup](./project-setup.md)). Before either step, long files need to have been segmented into smaller chunks of interest (e.g., songs, song bouts). See [segmenting files](./segmenting-files.ipynb) for more information.\n", "Of the datasets packaged with `pykanto`, only the `GREAT_TIT` dataset has already been segmented. If you want to use another dataset, you will need to segment it first, as demonstrated in [segmenting files](./segmenting-files.ipynb) and [feature extraction](./feature-extraction.ipynb).\n", "```\n", "\n", "The `GREAT_TIT` dataset consists of a few songs from two male great tits (_Parus major_) in [my study population](http://wythamtits.com/), Wytham Woods, Oxfordshire, UK. Let's load the paths pointing to it and create a `KantoData` object:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "from pykanto.utils.paths import pykanto_data\n", "from pykanto.dataset import KantoData\n", "from pykanto.parameters import Parameters" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "30248cdda5044587a0beccb94c5ff2e1", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading JSON files: 0%| | 0/20 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesIDlabelrecorderrecordistsource_datetimedatetimedatetimetimezonesample_ratelength_slower_frequpper_freqmax_amplitudemin_amplitudebit_depthtech_commentnoise
2021-B32-0415_05-11Great titB3224F319055FDF2205Nilo Merino Recalde2021-04-15 05:00:002021-04-15 05:07:22.8666672021-04-1505:07:22.866667UTC480001.139250250659220.673711-0.66670116Recorded at 05:00:00 15/04/2021 (UTC) by Audio...False
2021-B32-0415_05-15Great titB3224F319055FDF2205Nilo Merino Recalde2021-04-15 05:00:002021-04-15 05:08:16.5200002021-04-1505:08:16.520000UTC480001.194375239256940.356706-0.35127516Recorded at 05:00:00 15/04/2021 (UTC) by Audio...False
2021-B32-0415_05-21Great titB3224F319055FDF2205Nilo Merino Recalde2021-04-15 05:00:002021-04-15 05:09:27.6000002021-04-1505:09:27.600000UTC480001.188250239257390.189776-0.18838816Recorded at 05:00:00 15/04/2021 (UTC) by Audio...False
\n", "" ], "text/plain": [ " species ID label recorder \\\n", "2021-B32-0415_05-11 Great tit B32 24F319055FDF2205 \n", "2021-B32-0415_05-15 Great tit B32 24F319055FDF2205 \n", "2021-B32-0415_05-21 Great tit B32 24F319055FDF2205 \n", "\n", " recordist source_datetime \\\n", "2021-B32-0415_05-11 Nilo Merino Recalde 2021-04-15 05:00:00 \n", "2021-B32-0415_05-15 Nilo Merino Recalde 2021-04-15 05:00:00 \n", "2021-B32-0415_05-21 Nilo Merino Recalde 2021-04-15 05:00:00 \n", "\n", " datetime date time \\\n", "2021-B32-0415_05-11 2021-04-15 05:07:22.866667 2021-04-15 05:07:22.866667 \n", "2021-B32-0415_05-15 2021-04-15 05:08:16.520000 2021-04-15 05:08:16.520000 \n", "2021-B32-0415_05-21 2021-04-15 05:09:27.600000 2021-04-15 05:09:27.600000 \n", "\n", " timezone sample_rate length_s lower_freq upper_freq \\\n", "2021-B32-0415_05-11 UTC 48000 1.139250 2506 5922 \n", "2021-B32-0415_05-15 UTC 48000 1.194375 2392 5694 \n", "2021-B32-0415_05-21 UTC 48000 1.188250 2392 5739 \n", "\n", " max_amplitude min_amplitude bit_depth \\\n", "2021-B32-0415_05-11 0.673711 -0.666701 16 \n", "2021-B32-0415_05-15 0.356706 -0.351275 16 \n", "2021-B32-0415_05-21 0.189776 -0.188388 16 \n", "\n", " tech_comment noise \n", "2021-B32-0415_05-11 Recorded at 05:00:00 15/04/2021 (UTC) by Audio... False \n", "2021-B32-0415_05-15 Recorded at 05:00:00 15/04/2021 (UTC) by Audio... False \n", "2021-B32-0415_05-21 Recorded at 05:00:00 15/04/2021 (UTC) by Audio... False " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATASET_ID = \"GREAT_TIT\"\n", "DIRS = pykanto_data(dataset=DATASET_ID)\n", "# ---------\n", "params = Parameters() # Using default parameters for simplicity, which you should't!\n", "dataset = KantoData(DIRS, parameters=params, overwrite_dataset=True)\n", "dataset.data.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Tip: using an IDE\n", ":class: tip, dropdown\n", "\n", "If you don't already I highly recommend that you use an IDE to write and run your code, such as [vscode](https://code.visualstudio.com/) or [PyCharm](https://www.jetbrains.com/pycharm/). Among many other benefits, you will be able to see the documentation for each function in `pykanto` on hover:\n", "\n", "![IDE](../custom/IDE_example.png)\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "hide-output" ] }, "source": [ "We now have an object `dataset`, which is an instance of the `KantoData` class and has all of its methods.\n", "\n", "```{admonition} Tip:\n", ":class: tip\n", "\n", "See [how to create and use a `KantoData` object](./kantodata-dataset.ipynb) for more details.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, you might want to segment your songs into discrete notes using `pykanto`'s algorithm, which is a simple amplitude-based method that works reasonably well (based on Tim Sainburg's [vocalseg](https://github.com/timsainb/vocalization-segmentation) and Robert Lachlan's de-echoing method in [Luscinia](https://rflachlan.github.io/Luscinia/))." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using existing unit onset/offset information.\n", "Found and segmented 169 units.\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Segment:\n", "dataset.segment_into_units()\n", "\n", "# Plot an example:\n", "dataset.plot(dataset.data.index[0], segmented=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, you can create spectrogram representations of the units or the average of the units present in the vocalisations of each individual ID in the dataset, project and cluster them, and prepare compressed representations that can be used with the interactive app:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d7319d8211e44c4284f7c76e50c3da42", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Calculating and saving unit spectrograms: 0%| | 0/2 [00:00