Setting up a project#

Working with paths and directories#

pykanto provides a convenient way to store all paths pointing to directories and files in your project together: this makes it easier to access them, and promotes standardisation among your projects.

First, import any dependencies#
from pathlib import Path
from pykanto.utils.paths import link_project_data, ProjDirs

2. Set up project directories#

Next, tell pykanto where the raw data for your project live,

DATASET_ID = 'BIGBIRD_2021'
data_dir = project_root / "data" / "raw" / DATASET_ID

Note:

If you are working with a dataset where long audio files have already been segmented into smaller chunks (e.g., songs), you can simply pass the path to the segmented data folder to the RAW_DATA argument of ProjDirs. See the ProjDirs docs for more information.

and build the project’s directory tree:

DIRS = ProjDirs(project_root, data_dir, DATASET_ID,  mkdir=True)
print(DIRS)

If mkdir is set to True, the directories will be created if they don’t already exist. This is the resulting directory tree, assuming that your raw data folder is called raw.

πŸ“ project_root
β”œβ”€β”€ πŸ“ data
β”‚   β”œβ”€β”€ πŸ“ datasets
β”‚   β”‚   └── πŸ“ <DATASET_ID>
β”‚   β”‚       β”œβ”€β”€ <DATASET_ID>.db
β”‚   β”‚       └── πŸ“ spectrograms
|   β”œβ”€β”€ πŸ“ raw
β”‚   β”‚   └── πŸ“ <DATASET_ID>  
β”‚   └── πŸ“ segmented
β”‚       └── πŸ“ <lowercase name of RAW_DATA>
β”œβ”€β”€ πŸ“ resources
β”œβ”€β”€ πŸ“ reports
β”‚   └── πŸ“ figures
└── <other project files>

See the ProjDirs docs for more information.
Now you are ready to import and segment your raw data (see next section).