pykanto.utils.io#

Functions to read external files -e.g. JSON- efficiently.

Functions

copy_xml_files(file_list, dest_dir)

Copies a list of files to dest_dir / file.parent.name / file.name

get_unit_spectrograms(dataset, ID)

Retrieves unit (e.g. individual notes) spectrograms for a grouping ID in a

load_dataset(dataset_dir, DIRS[, relink_data])

Load an existing dataset, fixing any broken links to data using a new ProjDirs object.

make_tarfile(source_dir, output_filename)

Makes a tarfile from a given directory.

makedir(DIR[, return_path])

Make a safely nested directory.

read_json(json_loc)

Reads a .json file using ujson.

save_json(json_object, json_loc)

Saves a .json file using ujson.

save_songs(folder, specs)

Save song spectrograms as .jpg images to folder.

save_subset(train_dir, test_dir, dname, ...)

Save train and test subsets of dataset to disk as .jpg images (in folders correspoding to class labels).

save_to_jsons(dataset)

Appends new metadata generated in pykanto to the original json metadata files that were used to create a KantoData dataset.

Classes

NumpyEncoder(*[, skipkeys, ensure_ascii, ...])

Stores a numpy.ndarray or any nested-list composition as JSON.

pykanto.utils.io.load_dataset(dataset_dir: Path, DIRS: ProjDirs, relink_data: bool = True) KantoData[source]#

Load an existing dataset, fixing any broken links to data using a new ProjDirs object.

Parameters
  • dataset_dir (Path) – Path to the dataset file (*.db)

  • DIRS (ProjDirs) – New project directories

  • relink_data (bool, optional) – Whether to make update dataset paths. Defaults to True.

Returns

The dataset

Return type

KantoData

pykanto.utils.io.read_json(json_loc: pathlib.Path) Dict[source]#

Reads a .json file using ujson.

Parameters

json_loc (Path) – Path to json file.

Returns

Json file as a dictionary.

Return type

Dict

pykanto.utils.io.makedir(DIR: Path, return_path: bool = True) Path | None[source]#

Make a safely nested directory. Returns the Path object by default. Modified from code by Tim Sainburg (source).

Parameters
  • DIR (Path) – Path to be created. return_path (bool, optional): Whether to

  • True. (return the path. Defaults to) –

Raises

TypeError – Wrong argument type to ‘DIR’

Returns

Path to file or directory.

Return type

Path

pykanto.utils.io.copy_xml_files(file_list: List[pathlib.Path], dest_dir: pathlib.Path) None[source]#

Copies a list of files to dest_dir / file.parent.name / file.name

Parameters
  • file_list (List[Path]) – List of files to be copied.

  • dest_dir (Path) – Path to destination folder, will create it if doesn’t exist.

pykanto.utils.io.save_json(json_object: Dict, json_loc: pathlib.Path) None[source]#

Saves a .json file using ujson.

Parameters

json_loc (Path) – Path to json file.

Returns

Json file as a dictionary.

Return type

Dict

pykanto.utils.io.save_to_jsons(dataset: KantoData) None[source]#

Appends new metadata generated in pykanto to the original json metadata files that were used to create a KantoData dataset. These usually include things like type labels and unit oset/offsets.

Parameters

dataset (KantoData) – Dataset object.

class pykanto.utils.io.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#

Stores a numpy.ndarray or any nested-list composition as JSON. Source: karlB on Stack Overflow.

Extends the json.JSONEncoder class.

default(obj)[source]#

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
pykanto.utils.io.make_tarfile(source_dir: pathlib.Path, output_filename: pathlib.Path) None[source]#

Makes a tarfile from a given directory. Source: ` George V. Reilly on stack overflow <https://stackoverflow.com/a/17081026>`_.

Parameters
  • source_dir (Path) – Directory to tar

  • output_filename (Path) – Name of output file (e.g. file.tar.gz).

pykanto.utils.io.get_unit_spectrograms(dataset: KantoData, ID: str) Dict[str, np.ndarray][source]#
Retrieves unit (e.g. individual notes) spectrograms for a grouping ID in a

dataset.

Parameters
  • dataset (KantoData) – Dataset to use.

  • ID (str) – Which id to use (present in an ID column in the dataset)

Returns

A dictionary of spectrograms, keyed by

vocalisation index.

Return type

Dict[str, np.ndarray]

Example

>>> units = get_unit_spectrograms(dataset, "BIGBIRD")
>>> last_note = units["BIGBIRD_0"][-1]
pykanto.utils.io.save_songs(folder: pathlib.Path, specs: List[pathlib.Path]) None[source]#

Save song spectrograms as .jpg images to folder.

Parameters
  • folder (Path) – Path to destination folder.

  • specs (List[Path]) – List of spectrogram paths.

pykanto.utils.io.save_subset(train_dir: pathlib.Path, test_dir: pathlib.Path, dname: str, to_export: ItemsView[str, List[pathlib.Path]]) None[source]#

Save train and test subsets of dataset to disk as .jpg images (in folders correspoding to class labels).

Parameters
  • train_dir (Path) – Destination folder for training data.

  • test_dir (Path) – Destination folder for test data.

  • dname (str) – Name of subset, one of “train” or “test”.

  • to_export (ItemsView[str, List[Path]]) – Subset of dataset to export.