pykanto.utils.paths
pykanto.utils.paths#
Create project directory trees, assign path variables, and get paths for different types of files
Functions
|
Updates the location of the parent directories of a project, including the project name, for a given path. |
|
Returns paths to files with given extension found recursively within a directory. |
|
Returns a list of tuples containing [0] paths to wavfiles for which there is an annotation file and [1] paths to its annotation file. |
|
Creates a symlink from a project's data folder (not under version control) to the directory where the data lives (e.g. |
|
Loads pykanto's sample datasets. |
Classes
|
Initialises a ProjDirs class, which is used to store a project's file structure. |
- class pykanto.utils.paths.ProjDirs(PROJECT: pathlib.Path, RAW_DATA: pathlib.Path, DATASET_ID: str, mkdir: bool = False)[source]#
Initialises a ProjDirs class, which is used to store a project’s file structure. This is required when constructing a
KantoData
object and generally useful to keep paths tidy and in the same location.- Parameters
PROJECT (Path) – Root directory of the project.
RAW_DATA (Path) – (Immutable) location of the raw data to be used in this project.
DATASET_ID (str) – Name of the dataset.
mkdir (bool, optional) – Wether to create directories if they don’t already exist. Defaults to False.
- PROJECT#
Root directory of the project.
- Type
Path
- DATA#
Directory for project data.
- Type
Path
- RAW_DATA#
(Immutable) location of the raw data to be used in this project.
- Type
Path
- SEGMENTED#
Directory for segmented audio data.
- Type
Path
- SPECTROGRAMS#
Directory for project spectrograms.
- Type
Path
- RESOURCES#
Directory for project resources.
- Type
Path
- REPORTS#
Directory for project reports.
- Type
Path
- FIGURES#
Directory for project figures.
- Type
Path
- DATASET#
Directory for project datasets.
- Type
Path
- DATASET_ID#
Name of the dataset.
- Type
str
Examples
>>> from pathlib import Path >>> from pykanto.utils.paths import ProjDirs >>> DATASET_ID = "BIGBIRD" >>> PROJROOT = Path('home' / 'user' / 'projects' / 'myproject') >>> RAW_DATA= Path('bigexternaldrive' / 'fieldrecordings') >>> DIRS = ProjDirs(PROJROOT, RAW_DATA, DATASET_ID, mkdir=True) ... 📁 project ... ├── 📁 data ... │ ├── 📁 datasets ... │ │ └── 📁 <DATASET_ID> ... │ │ ├── <DATASET_ID>.db ... │ │ └── 📁 spectrograms ... | ├── 📁 RAW_DATA ... │ │ └── 📁 <DATASET_ID> ... │ └── 📁 segmented ... │ └── 📁 <lowercase name of RAW_DATA> ... ├── 📁 resources ... ├── 📁 reports ... │ └── 📁 figures ... └── <other project files>
- __init__(PROJECT: pathlib.Path, RAW_DATA: pathlib.Path, DATASET_ID: str, mkdir: bool = False)[source]#
- append(new_attr: str, new_value: pathlib.Path, mkdir: bool = False) None [source]#
Appends a new attribute to the class instance.
- Parameters
new_attr (str) – Name of the new attribute.
new_value (Path) – New directory.
mkdir (bool, optional) – Whether to create this directory if it doesn’t already exist. Defaults to False.
- update_json_locs(overwrite: bool = False, ignore_checks: bool = False) None [source]#
Updates the
wav_file
field in JSON metadata files for a given project. This is useful if you have moved your data to a new location. It will fix broken links to the .wav files, provided that theProjDirs
object has aSEGMENTED
attribute pointing to a valid directory containing/WAV
and/JSON
subdirectories.- Parameters
overwrite (bool, optional) – Whether to force change paths even if the current ones work. Defaults to False.
ignore_checks (bool, optional) – Wether to check that wav and JSON files coincide. Useful if you just want to change JSONS in a different location to where the rest of the data are. Defaults to False.
- pykanto.utils.paths.get_wavs_w_annotation(wav_filepaths: List[pathlib.Path], annotation_paths: List[pathlib.Path]) List[Tuple[pathlib.Path, pathlib.Path]] [source]#
Returns a list of tuples containing [0] paths to wavfiles for which there is an annotation file and [1] paths to its annotation file. Assumes that wav and paths to the annotation files share the same file name and only their file extension changes.
- Parameters
wav_filepaths (List[Path]) – List of paths to wav files.
annotation_paths (List[Path]) – List of paths to annotation files.
- Returns
Filtered list.
- Return type
List[Tuple[Path, Path]]
- pykanto.utils.paths.change_data_loc(DIR: pathlib.Path, PROJECT: pathlib.Path, NEW_PROJECT: pathlib.Path) pathlib.Path [source]#
Updates the location of the parent directories of a project, including the project name, for a given path. Used when the location of a dataset changes (e.g if transferring a project to a new machine).
- Parameters
DIR (Path) – Path to update
PROJECT ([type]) – Old -broken- project directory.
NEW_PROJECT (Path) – New working project directory.
- Returns
Updated path.
- Return type
Path
- pykanto.utils.paths.get_file_paths(root_dir: pathlib.Path, extensions: List[str], verbose: bool = False) List[pathlib.Path] [source]#
Returns paths to files with given extension found recursively within a directory.
- Parameters
root_dir (Path) – Root directory to search recursively.
extensions (List[str]) – File extensions to look for (e.g., .wav)
- Raises
FileNotFoundError – No files found.
- Returns
List with path to files.
- Return type
List[Path]
- pykanto.utils.paths.link_project_data(origin: os.PathLike, project_data_dir: pathlib.Path) None [source]#
Creates a symlink from a project’s data folder (not under version control) to the directory where the data lives (e.g. on an external HDD).
- Parameters
origin (os.PathLike) – Path to the directory containing your ‘raw’ data folder.
project_data_dir (Path) – A project’s data folder to link with ‘origin’.
Note
This will work in unix-like systems but might cause problems in Windows. See how to enable symlinks in Windows
- Raises
ValueError – The ‘project_data_dir’ already contains data or is a
symlink. –
FileExistsError – File exists; your target folder already exists.
- pykanto.utils.paths.pykanto_data(dataset: str = 'GREAT_TIT') pykanto.utils.paths.ProjDirs [source]#
Loads pykanto’s sample datasets. These are minimal data examples intended for testing and tutorials.
- Parameters
dataset (str, optional) – Dataset name, one of [“STORM-PETREL”, “BENGALESE_FINCH”, “GREAT_TIT”, “AM”]. Defaults to “GREAT_TIT”.
- Returns
An object with paths to data directories that can then be used to create a dataset.
- Return type