{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# The KantoData dataset" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "13b1f7d522f34950965738258402ae21", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading JSON files: 0%| | 0/20 [00:00" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "| Method | Description |\n", "| --- | --- |\n", "|```dataset = load_dataset()``` | Load an existing dataset |\n", "|```dataset.save_to_disk()``` | Save an existing dataset | \n", "|```dataset.to_csv()``` | Save a dataset to csv |\n", "|```dataset.write_to_json()``` | Save new metadata to JSON files |" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "You can get some basic information about the contents of the dataset by running:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total length: 20\n", "Unique IDs: 2\n" ] }, { "data": { "text/plain": [ "B32 11\n", "SW83 9\n", "Name: ID, dtype: int64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.sample_info()\n", "dataset.data['ID'].value_counts()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "`KantoData.data` and `KantoData.units` are {py:class}`pandas.DataFrame`\n", "instances: I have chosen this format because it is a very flexible and most users are\n", "already familiar with it. You can query and modify it as you would any other\n", "pandas dataframe. For example, to see the first three rows and a subset of columns:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
daterecordistunit_durations
2021-B32-0415_05-112021-04-15Nilo Merino Recalde[0.0986848072562358, 0.10448979591836727, 0.10...
2021-B32-0415_05-152021-04-15Nilo Merino Recalde[0.1102947845804989, 0.09868480725623585, 0.12...
2021-B32-0415_05-212021-04-15Nilo Merino Recalde[0.1219047619047619, 0.10448979591836738, 0.14...
\n", "
" ], "text/plain": [ " date recordist \\\n", "2021-B32-0415_05-11 2021-04-15 Nilo Merino Recalde \n", "2021-B32-0415_05-15 2021-04-15 Nilo Merino Recalde \n", "2021-B32-0415_05-21 2021-04-15 Nilo Merino Recalde \n", "\n", " unit_durations \n", "2021-B32-0415_05-11 [0.0986848072562358, 0.10448979591836727, 0.10... \n", "2021-B32-0415_05-15 [0.1102947845804989, 0.09868480725623585, 0.12... \n", "2021-B32-0415_05-21 [0.1219047619047619, 0.10448979591836738, 0.14... " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.data[['date', 'recordist', 'unit_durations']].head(3)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Or to extract the length of each vocalisation and calculate inter-onset\n", "intervals:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "last_offsets = dataset.data[\"offsets\"].apply(lambda x: x[-1]).to_list()\n", "iois = dataset.data.onsets.apply(\n", " lambda x: np.diff(x)\n", ")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Vocalisation durations: ['2.12', '1.99', '2.16', '2.32', '1.81']\n", "IOIs: ['0.22', '0.23', '0.25', '0.24', '0.26']\n" ] } ], "source": [ "print(\"Vocalisation durations: \",[f\"{x:.2f}\" for x in last_offsets[:5]])\n", "print(\"IOIs: \", [f\"{x:.2f}\" for x in iois[0][:5]])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.12 ('pykanto-dev')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "cf30c6a63fc6852a8d910622565c3348d4a7fab8fc38710c97d8db63a595f32d" } } }, "nbformat": 4, "nbformat_minor": 2 }