pykanto.utils.io#

Functions to read external files -e.g. JSON- efficiently.

Functions

`copy_xml_files`(file_list, dest_dir)	Copies a list of files to `dest_dir / file.parent.name / file.name`
`get_unit_spectrograms`(dataset, ID)	Retrieves unit (e.g. individual notes) spectrograms for a grouping ID in a
`load_dataset`(dataset_dir, DIRS[, relink_data])	Load an existing dataset, fixing any broken links to data using a new ProjDirs object.
`make_tarfile`(source_dir, output_filename)	Makes a tarfile from a given directory.
`makedir`(DIR[, return_path])	Make a safely nested directory.
`read_json`(json_loc)	Reads a .json file using ujson.
`save_json`(json_object, json_loc)	Saves a .json file using ujson.
`save_songs`(folder, specs)	Save song spectrograms as .jpg images to folder.
`save_subset`(train_dir, test_dir, dname, ...)	Save train and test subsets of dataset to disk as .jpg images (in folders correspoding to class labels).
`save_to_jsons`(dataset)	Appends new metadata generated in pykanto to the original json metadata files that were used to create a `KantoData` dataset.

Classes

NumpyEncoder(*[, skipkeys, ensure_ascii, ...])

Stores a numpy.ndarray or any nested-list composition as JSON.

pykanto.utils.io.load_dataset(dataset_dir: Path, DIRS: ProjDirs, relink_data: bool = True) → KantoData[source]#

Load an existing dataset, fixing any broken links to data using a new ProjDirs object.

Parameters

dataset_dir (Path) – Path to the dataset file (*.db)
DIRS (ProjDirs) – New project directories
relink_data (bool, optional) – Whether to make update dataset paths. Defaults to True.

Returns

The dataset

Return type

KantoData

pykanto.utils.io.read_json(json_loc: pathlib.Path) → Dict[source]#

Reads a .json file using ujson.

Parameters: json_loc (Path) – Path to json file.
Returns: Json file as a dictionary.
Return type: Dict

pykanto.utils.io.makedir(DIR: Path, return_path: bool = True) → Path | None[source]#

Make a safely nested directory. Returns the Path object by default. Modified from code by Tim Sainburg (source).

Parameters

DIR (Path) – Path to be created. return_path (bool, optional): Whether to
True. (return the path. Defaults to) –

Raises

TypeError – Wrong argument type to ‘DIR’

Returns

Path to file or directory.

Return type

Path

pykanto.utils.io.copy_xml_files(file_list: List[pathlib.Path], dest_dir: pathlib.Path) → None[source]#

Copies a list of files to dest_dir / file.parent.name / file.name

Parameters

file_list (List[Path]) – List of files to be copied.
dest_dir (Path) – Path to destination folder, will create it if doesn’t exist.

pykanto.utils.io.save_json(json_object: Dict, json_loc: pathlib.Path) → None[source]#

Saves a .json file using ujson.

Parameters: json_loc (Path) – Path to json file.
Returns: Json file as a dictionary.
Return type: Dict

pykanto.utils.io.save_to_jsons(dataset: KantoData) → None[source]#

Appends new metadata generated in pykanto to the original json metadata files that were used to create a KantoData dataset. These usually include things like type labels and unit oset/offsets.

Parameters: dataset (KantoData) – Dataset object.

class pykanto.utils.io.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#

Stores a numpy.ndarray or any nested-list composition as JSON. Source: karlB on Stack Overflow.

Extends the json.JSONEncoder class.

default(obj)[source]#

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

pykanto.utils.io.make_tarfile(source_dir: pathlib.Path, output_filename: pathlib.Path) → None[source]#

Makes a tarfile from a given directory. Source: ` George V. Reilly on stack overflow <https://stackoverflow.com/a/17081026>`_.

Parameters

source_dir (Path) – Directory to tar
output_filename (Path) – Name of output file (e.g. file.tar.gz).

pykanto.utils.io.get_unit_spectrograms(dataset: KantoData, ID: str) → Dict[str, np.ndarray][source]#

Retrieves unit (e.g. individual notes) spectrograms for a grouping ID in a: dataset.

Parameters

dataset (KantoData) – Dataset to use.
ID (str) – Which id to use (present in an ID column in the dataset)

Returns

A dictionary of spectrograms, keyed by: vocalisation index.

Return type

Dict[str, np.ndarray]

Example

>>> units = get_unit_spectrograms(dataset, "BIGBIRD")
>>> last_note = units["BIGBIRD_0"][-1]

pykanto.utils.io.save_songs(folder: pathlib.Path, specs: List[pathlib.Path]) → None[source]#

Save song spectrograms as .jpg images to folder.

Parameters

folder (Path) – Path to destination folder.
specs (List[Path]) – List of spectrogram paths.

pykanto.utils.io.save_subset(train_dir: pathlib.Path, test_dir: pathlib.Path, dname: str, to_export: ItemsView[str, List[pathlib.Path]]) → None[source]#

Save train and test subsets of dataset to disk as .jpg images (in folders correspoding to class labels).

Parameters

train_dir (Path) – Destination folder for training data.
test_dir (Path) – Destination folder for test data.
dname (str) – Name of subset, one of “train” or “test”.
to_export (ItemsView[str, List[Path]]) – Subset of dataset to export.