pykanto.utils.paths#

Create project directory trees, assign path variables, and get paths for different types of files

Functions

change_data_loc(DIR, PROJECT, NEW_PROJECT)

Updates the location of the parent directories of a project, including the project name, for a given path.

get_file_paths(root_dir, extensions[, verbose])

Returns paths to files with given extension found recursively within a directory.

get_wavs_w_annotation(wav_filepaths, ...)

Returns a list of tuples containing [0] paths to wavfiles for which there is an annotation file and [1] paths to its annotation file.

link_project_data(origin, project_data_dir)

Creates a symlink from a project's data folder (not under version control) to the directory where the data lives (e.g.

pykanto_data([dataset])

Loads pykanto's sample datasets.

Classes

ProjDirs(PROJECT, RAW_DATA, DATASET_ID[, mkdir])

Initialises a ProjDirs class, which is used to store a project's file structure.

class pykanto.utils.paths.ProjDirs(PROJECT: pathlib.Path, RAW_DATA: pathlib.Path, DATASET_ID: str, mkdir: bool = False)[source]#

Initialises a ProjDirs class, which is used to store a project’s file structure. This is required when constructing a KantoData object and generally useful to keep paths tidy and in the same location.

Parameters
  • PROJECT (Path) – Root directory of the project.

  • RAW_DATA (Path) – (Immutable) location of the raw data to be used in this project.

  • DATASET_ID (str) – Name of the dataset.

  • mkdir (bool, optional) – Wether to create directories if they don’t already exist. Defaults to False.

PROJECT#

Root directory of the project.

Type

Path

DATA#

Directory for project data.

Type

Path

RAW_DATA#

(Immutable) location of the raw data to be used in this project.

Type

Path

SEGMENTED#

Directory for segmented audio data.

Type

Path

SPECTROGRAMS#

Directory for project spectrograms.

Type

Path

RESOURCES#

Directory for project resources.

Type

Path

REPORTS#

Directory for project reports.

Type

Path

FIGURES#

Directory for project figures.

Type

Path

DATASET#

Directory for project datasets.

Type

Path

DATASET_ID#

Name of the dataset.

Type

str

Examples

>>> from pathlib import Path
>>> from pykanto.utils.paths import ProjDirs
>>> DATASET_ID = "BIGBIRD"
>>> PROJROOT = Path('home' / 'user' / 'projects' / 'myproject')
>>> RAW_DATA= Path('bigexternaldrive' / 'fieldrecordings')
>>> DIRS = ProjDirs(PROJROOT, RAW_DATA, DATASET_ID, mkdir=True)
... 📁 project
... ├── 📁 data
... │   ├── 📁 datasets
... │   │   └── 📁 <DATASET_ID>
... │   │       ├── <DATASET_ID>.db
... │   │       └── 📁 spectrograms
... |   ├── 📁 RAW_DATA
... │   │   └── 📁 <DATASET_ID>
... │   └── 📁 segmented
... │       └── 📁 <lowercase name of RAW_DATA>
... ├── 📁 resources
... ├── 📁 reports
... │   └── 📁 figures
... └── <other project files>
__init__(PROJECT: pathlib.Path, RAW_DATA: pathlib.Path, DATASET_ID: str, mkdir: bool = False)[source]#
append(new_attr: str, new_value: pathlib.Path, mkdir: bool = False) None[source]#

Appends a new attribute to the class instance.

Parameters
  • new_attr (str) – Name of the new attribute.

  • new_value (Path) – New directory.

  • mkdir (bool, optional) – Whether to create this directory if it doesn’t already exist. Defaults to False.

update_json_locs(overwrite: bool = False, ignore_checks: bool = False) None[source]#

Updates the wav_file field in JSON metadata files for a given project. This is useful if you have moved your data to a new location. It will fix broken links to the .wav files, provided that the ProjDirs object has a SEGMENTED attribute pointing to a valid directory containing /WAV and /JSON subdirectories.

Parameters
  • overwrite (bool, optional) – Whether to force change paths even if the current ones work. Defaults to False.

  • ignore_checks (bool, optional) – Wether to check that wav and JSON files coincide. Useful if you just want to change JSONS in a different location to where the rest of the data are. Defaults to False.

pykanto.utils.paths.get_wavs_w_annotation(wav_filepaths: List[pathlib.Path], annotation_paths: List[pathlib.Path]) List[Tuple[pathlib.Path, pathlib.Path]][source]#

Returns a list of tuples containing [0] paths to wavfiles for which there is an annotation file and [1] paths to its annotation file. Assumes that wav and paths to the annotation files share the same file name and only their file extension changes.

Parameters
  • wav_filepaths (List[Path]) – List of paths to wav files.

  • annotation_paths (List[Path]) – List of paths to annotation files.

Returns

Filtered list.

Return type

List[Tuple[Path, Path]]

pykanto.utils.paths.change_data_loc(DIR: pathlib.Path, PROJECT: pathlib.Path, NEW_PROJECT: pathlib.Path) pathlib.Path[source]#

Updates the location of the parent directories of a project, including the project name, for a given path. Used when the location of a dataset changes (e.g if transferring a project to a new machine).

Parameters
  • DIR (Path) – Path to update

  • PROJECT ([type]) – Old -broken- project directory.

  • NEW_PROJECT (Path) – New working project directory.

Returns

Updated path.

Return type

Path

pykanto.utils.paths.get_file_paths(root_dir: pathlib.Path, extensions: List[str], verbose: bool = False) List[pathlib.Path][source]#

Returns paths to files with given extension found recursively within a directory.

Parameters
  • root_dir (Path) – Root directory to search recursively.

  • extensions (List[str]) – File extensions to look for (e.g., .wav)

Raises

FileNotFoundError – No files found.

Returns

List with path to files.

Return type

List[Path]

Creates a symlink from a project’s data folder (not under version control) to the directory where the data lives (e.g. on an external HDD).

Parameters
  • origin (os.PathLike) – Path to the directory containing your ‘raw’ data folder.

  • project_data_dir (Path) – A project’s data folder to link with ‘origin’.

Note

This will work in unix-like systems but might cause problems in Windows. See how to enable symlinks in Windows

Raises
  • ValueError – The ‘project_data_dir’ already contains data or is a

  • symlink.

  • FileExistsError – File exists; your target folder already exists.

pykanto.utils.paths.pykanto_data(dataset: str = 'GREAT_TIT') pykanto.utils.paths.ProjDirs[source]#

Loads pykanto’s sample datasets. These are minimal data examples intended for testing and tutorials.

Parameters

dataset (str, optional) – Dataset name, one of [“STORM-PETREL”, “BENGALESE_FINCH”, “GREAT_TIT”, “AM”]. Defaults to “GREAT_TIT”.

Returns

An object with paths to data directories that can then be used to create a dataset.

Return type

ProjDirs