High Performance Computing#

Introduction#

Many of the tasks that pykanto carries out are computationally intensive, such as calculating spectrograms and running dimensionality reduction and clustering algorithms. High-level, interpreted languages—like R or Python—can be slow: where possible, I have optimised performance by both a) translating functions to optimized machine code at runtime using Numba and b) parallelising tasks using Ray, a platform for distributed computing. As an example, the segment_into_units() function can find and segment 20.000 discrete acoustic units in approximately 16 seconds on a desktop 8-core machine; a dataset with over half a million (556.472) units takes ~132 seconds on a standard 48-core compute node.

pykanto works in average desktop machines, but for most real-world applications you will probably want to use it on a compute cluster. This can be a daunting task for the uninitiated, so I have packaged some tools that should make it a little bit easier—at least they do for me!

Slurm is still the most popular job scheduler used in compute clusters and the one I’m familiar with, so the following instructions and tips refer to it.

Using pykanto in a HPC cluster#

This library uses Ray for parallel/distributed computation. Ray provides tools to ‘go from a single CPU to multi-core, multi-GPU or multi-node’. Submitting jobs that use multiple nodes or multiple GPUs is slightly more involved than using single-core or multi-core jobs. This might be overkill for some users, but if you need to—for example if you are training large models, or if you have a truly large dataset—then this will hopefully help:

Instructions#

  • Add this to the top of the script you want to run, right after any imports:

    1redis_password = sys.argv[1]
    2ray.init(address=os.environ["ip_head"], _redis_password=redis_password)
    3print(ray.cluster_resources())
    
  • Request compute resources the same way you would normally do, say you want an interactive session in one node with an NVIDIA v100 GPU:

    # For reference only, how you do this exactly will depend on which particular system you are using.
    srun -p interactive --x11 --pty --gres=gpu:v100:1 --mem=90000 /bin/bash
    
  • You can run pykanto-slaunch --help in your terminal to see which arguments you can pass to pykanto-slaunch.

    A sumbission command will look something like this:

     pykanto-slaunch --exp BigBird2020 --p short --time 00:30:00 -n 1 --memory 40000 --gpu 1 --c "python 0.0_build-dataset.py"
    

    This will create a bash (.sh) file and a log (.log) file in a /logs folder within the directory from which you are calling the script.

  • Check the logfile for errors!