Setting up a project
Contents
Setting up a project#
Working with paths and directories#
pykanto
provides a convenient way to store all paths pointing to directories
and files in your project together: this makes it easier to access them, and
promotes standardisation among your projects.
from pathlib import Path
from pykanto.utils.paths import link_project_data, ProjDirs
1. Link your data#
Find your projectβs root directory. You can do this in any number of ways, as long as you do it programmatically. For example, the code below assumes that you are doing version control with git and simply gets the root of your repository:
project_root = Path(
git.Repo(".", search_parent_directories=True).working_tree_dir
)
It is common to have your raw data on a large external drive or remote server
(for example I use a RAID system). If this is the case for you, you probably
want to link the actual location of your raw data to an otherwise empty /data
folder in your project for ease of access and clarity. Pykanto includes a
function to do just that:
external_data = Path('path/to/your/data/drive')
link_project_data(external_data, project_root / 'data')
Tip: freeze your raw data and only work on programmatically derived datasets
You wil likely create different derived datasets from the same raw data, and that is why pykanto lets your (raw) data live wherever you want. I strongly recommend that you make its directory read-only and never ever touch it.
2. Set up project directories#
Next, tell pykanto
where the raw data for your project live,
DATASET_ID = 'BIGBIRD_2021'
data_dir = project_root / "data" / "raw" / DATASET_ID
Note:
If you are working with a dataset where long audio files have already been segmented into smaller chunks (e.g., songs), you can simply pass the path to the segmented data folder to the RAW_DATA
argument of ProjDirs
. See the ProjDirs
docs for more information.
and build the projectβs directory tree:
DIRS = ProjDirs(project_root, data_dir, DATASET_ID, mkdir=True)
print(DIRS)
If mkdir
is set to True
, the directories will be created if they donβt
already exist. This is the resulting directory tree, assuming that your raw data
folder is called raw
.
π project_root
βββ π data
β βββ π datasets
β β βββ π <DATASET_ID>
β β βββ <DATASET_ID>.db
β β βββ π spectrograms
| βββ π raw
β β βββ π <DATASET_ID>
β βββ π segmented
β βββ π <lowercase name of RAW_DATA>
βββ π resources
βββ π reports
β βββ π figures
βββ <other project files>
See the
ProjDirs
docs for more information.
Now you are ready to import and segment your raw data (see next section).