Introduction to bioacoustics

Notes:

This guide has a number of direct hyperlinks: I recommend that you right-click and open them in new browser tabs.
You will need ~ 3gb of free disk space for this activity. I recommend that you close any running software other than Rstudio, as well as any unused browser tabs, before you start running code.
The output figures are not included in this document - you will have to generate them!

1 Prerequisites

Please follow these instructions before the day of the practical. You can contact me at nilo.recalde@zoo.ox.ac.uk if you run into any issues.

1.1 Install `Sonic Visualiser`

We will be using this application to view and annotate audio files. Go to this website and download the installer for your platform (i.e., Windows, Mac, or a Linux distribution). Navigate to the downloaded file, execute it and follow the instructions. You can check that the installation has been successful by right-clicking on an audio file (e.g., .wav) and choosing ‘open with’, then ‘Sonic Visualiser’.

There are some demo videos here, including one on how to install this software on Mac, and this is the reference manual, should you need it.

1.2 Install R dependencies

Double-click on the bioacoustics-practical.Rproj file to open it in RStudio. Now open the full-vignette.Rmd using the RStudio file browser and execute the code chunk below. You can do this by clicking the green ‘play’ button to the right of the chunk or by pressing ctrl/cmd+shift+enter. This will install the code libraries required for this practical should you not have them already. Depending on the speed of your internet connection this might take up to a few minutes.

dependencies <- c(
  "rprojroot",
  "warbleR",
  "dplyr",
  "tibble",
  "magrittr",
  "stringr",
  "ggrepel",
  "patchwork",
  "filesstrings",
  "pbapply",
  "randomForest",
  "yardstick"
)

# Load or install packages
packages <- lapply(dependencies, function(y) {
  if (!y %in% installed.packages()[, "Package"]) {
    install.packages(y)
  }
  try(require(y, character.only = T), silent = T)
})

2 General introduction

2.1 Introducing the dataset

Navigate to data/audio-files in your project folder, bioacoustics-practical.

These audio files are sample vocalisations for 15 bird species across a wide body mass range — from the Goldcrest’s 5.8 grams of sheer adorableness to the much graver looking Rook, over 70 times larger.

Rook picture (right) by hedera.baltica, Goldcrest (left) by Cliff Watkinson. Both CC BY-SA 2.0

During this activity you will:

Try to guess the species that produced each of these vocalisations,
Extract and analyse basic acoustic information to test hypotheses involving bird vocalisations, and
Learn how to perform more detailed audio feature extraction and analysis to characterise vocalisations.

2.2 Visualising sound

Right-click on the first file and choose ‘open with’, then ‘Sonic Visualiser’. You might want to make this software your default option to open .wav files; this will save you having to do this every time you want to open a new file. (?)

Once the file is open you should see a waveform at the bottom of the window.

A waveform shows changes in amplitude over time

Press W on your keyboard (alternatively, click on the ‘Pane’ menu, then ‘Add Waveform’). You will see a second waveform, this time with greater temporal resolution.

Play the sound by pressing your spacebar.
Scroll to increase or decrease the time range.

This is a useful visualisation if we want to see how ‘loud’ vocalisations are at a particular point in time. But it does not give us any information about the spectral characteristics of the sound, that is, about how much vibration there is at each different frequency.

For this reason, researchers working with sound often use spectrograms, sometimes also called sonographs. You can take a minute to play around in this website to see how a spectrogram works. Notice how the y-axis represents a range of frequencies, the x-axis shows time, and colour encodes amplitude, or how ‘loud’ each point is.

Example of a spectrogram like the ones we will be making during the session

Now, back in Sonic Visualiser,

You can now remove the waveform pane (right-click > delete layer until there are none left) and add a new spectrogram pane by pressing G.
Change the Window parameter to 512 or 1024 and Window overlap, to its right, to 93.75%. Here is brief explanation of how these parameters determine the resolution of the spectrogram, should you be interested. Play around with these - the different species in the dataset have been recorded at different sample rates, so optimal parameters will differ.
Use the zoom wheels to zoom in or out in the x and y axes.
Change the colour palette in the Colour tab.
Use the first wheel to the right of the colour tab to play with the colour threshold; it is useful to adjust this until the background noise disappears (see gif below)
Once you are happy with how the spectrogram looks you can click on File > Export Session as Template, give it a name, and select the option to set it as default. This will spare you from having to do this every time you open a new file.

Spectrogram thresholding

3 Identifying some widespread UK birds

Now navigate to ./data/audio-files in your project folder, bioacoustics-practical.

These are recordings of songs and vocalisations of 15 species of birds. They are selected as species that we should hear in and around Oxford in the next two weeks, but are just identified with file names 1.mp3 to 15.mp3.

Your task is to work out the identity of the 15 species in the recordings & record these in your bird-ids.csv list, which can be found in the ./data folder. Some of you may want to have a go based on your existing knowledge, but even if so, please go to xeno-canto and search for the species and listen to listed recordings of song to double check.
If you’re less sure, you can open the rand-species-names.csv file which has the (randomised) species names in them. This will give you a list of species to match up with the recordings.
Go to xeno-canto and enter a species names in the search box at the top – in this case you can see there are 113 hits for Great Tinamou and 6044 for Great Tit – it is the latter that we want!

Entering “Great Tit” returns a new page with a map of the species range and the geographic location of the recording.
As there may be geographical variation in songs, zoom in on the UK; you’ll note that the dark blue dots on the UK correspond to the subspecies ‘newtoni’ on the key to the right – this is usually accepted as a subspecies endemic to the UK for great tit. Don’t worry if there is no subspecies endemic to the UK.
Click on the name of the UK subspecies (in this case newtoni) and it will filter the recordings by location to a UK list. Note that these are recordings of songs and calls, and we’re asking you to identify (mostly) songs, and that the recordings are scored on quality (in the “Actions” column from “A” – highest to “E” – lowest).

We’ve allowed c. 45 minutes for this. If any of you finish much ahead of this, please feel free to explore the functionality of xeno-canto. There are also many recordings of songs and calls on eBird: simply enter the species name that you want.

Please note:

The xeno-canto website can be slow sometimes - if it’s acting up you can alternatively use the Macaulay Library and eBird.
Make sure that you do not modify the layout of the bird-ids.csv file beyond entering your guesses in the ‘name’ column, and that you save it as a .csv when you are done.

If you have a lot of time and want quite a tough challenge, try one of the eBird audio quizzes here. We’ll use these species to carry out some comparative analysis of song properties in the next section.

4 Measuring sound

We are now going to explore a real hypothesis concerning the frequencies at which birds produce their vocalisations. In the process, you will learn to perform measurements from sound recordings.

4.1 The problem

The mass of the vibrating body that creates a sound influences its frequency, so we might expect that A, body size, be correlated with B, the morphology of the vocal apparatus, which would, in turn, influence C, the frequency of a bird’s vocalisations. We cannot easily investigate B in the course of this activity, but we can test whether A and C are themselves correlated — which would provide some support for this idea.

Q: If there is indeed a correlation between the size of a bird and the frequency of its vocalisations, of which sign would you expect it to be?

Q: What other factors might lead to differences in song frequency across different bird species?

To test this hypothesis — that body size affects the frequency of vocalisations — you will need to extract some basic spectral information from your dataset. We will then provide you with body mass measurements for each of the species, your ‘explanatory’ variable.

4.2 Extracting time and frequency data from sound files

Click on Layer > Add New Boxes Layer in a Sonic Visualiser window (right click on the spectrogram or on the top menu). You can now draw boxes over the elements of interest, and erase any box that you are not happy with.

Example of bounding boxes defining discrete elements in a vocalisation

Adjust the colour threshold (as explained above) until only the brightest parts of the image are visible. This will help you focus on the frequencies with the maximum energy, or peak frequencies, which is the song trait that we will analyse here. Peak frequencies are the ‘loudest’, and will therefore be the brightest in the spectrogram

This is the range where the peak frequencies are in this vocalisation

Draw boxes over a representative sample of the acoustic elements present in a recording. Do this for each discrete type of sound that you see, but you do not need to include more than one or two examples per type. Indeed, for swiftness’ sake, you shouldn’t. You do not need to segment more than ~ 10 elements per audio file: a small number will already capture important information.
Once you are done with a recording, press ctrl+Y/cmd+Y, or click on File > Export Annotation Layer.
Give this file a name — exactly the number of the file + .csv, for example, 1.csv — and save it in the /data/frequency-data/ folder within the project’s folder. Leave the option to include a header unticked.
You can now close the current file (no need to save the session) and open the next one.

While you work through the vocalisations, notice how some birds have simple, pure-tone vocalisations, while those of others have a multi-frequency harmonic spectrum, with a fundamental frequency and many harmonic overtones. Still other species combine these two types skilfully.

Q: Can you find examples of each of these in the dataset? What might be the mechanisms that produce these differences? Here is a paper about this if you want to read more.

4.3 Visualising the results using R

Now we will import the data that you have just generated using R, calculate the mean frequency for each species, and plot our variables of interest: mean frequency and body mass.

Open the R project in Rstudio (this is the bioacoustics-practical.Rproj file). Once in Rstudio, open the full-vignette.Rmd file and navigate to this section.

If you press Ctrl/Cmd+Shift+O while the Rstudio window is active you will get an outline of the document, which might help you find your way around it more easily. If you have RStudio v1.4 or higher you can also activate the ‘visual markdown’ mode — which will make your .Rmd file look more like the .html that you were using up until now — by clicking the ‘pair of compasses’ button at the top-right of the document toolbar.

First, the following code chunk will install the code libraries required for this practical, should you not have them already:

# Where are you? Find the project root
maindir <- rprojroot::find_rstudio_root_file()
datadir <- file.path(maindir, "data")

# Load libraries and source code
source(file.path(maindir, "src", "functions.R"), local = knitr::knit_global())

Now we will import the data that you manually extracted from the audio files.

# Import the vocalisation frequency data
freqdir <- file.path(datadir, "frequency-data")
freq_df <- import_freqdata(freqdir, tutor = tutor)

The next step is to read the body mass data and your species guesses form their respective files, and consolidate them into a single dataframe.

You can see how we are doing this under the hood by doing ctrl/cmd+click on a function name.

# First the body mass data
body_mass_df <- read.csv(file.path(datadir, "body-mass-data.csv"),
  header = TRUE
)

# Then your species guesses
ids_df <- read.csv(file.path(datadir, "bird-ids.csv"),
  header = TRUE
) %>%
  replace(is.na(.), "Unknown") %>%
  mutate(
    name = str_replace(str_to_sentence(name), "_", " "),
    file = as.integer(file)
  ) %>%
  as_tibble()

# Now put everything together in a single dataframe
full_df <- merge_data(datadir, body_mass_df, ids_df, freq_df, tutor = tutor)

4.4 Results

You can take a look at the final dataset:

knitr::kable(full_df, digits = 2)

Let’s now inspect the variables separately:

plot_vars(full_df)

Finally, take a look at how the variables are related:

plot_mass_vs_freq(full_df)

For Discussion:
- Is the hypothesis about body size and vocalisation frequency supported from these data?
- What key aspect might we need to control for further in a more rigourous analysis?
- Why have used the logarithm of body mass instead of body mass directly?

Further reading:

Mikula, P., Valcu, M., Brumm, H., Bulla, M., Forstmeier, W., Petrusková, T., Kempenaers, B. and Albrecht, T. (2021), A global analysis of song frequency in passerines provides no support for the acoustic adaptation hypothesis but suggests a role for sexual selection. Ecology Letters, 24: 477-486. https://doi.org/10.1111/ele.13662

Ryan, M. J., & Brenowitz, E. A. (1985). The Role of Body Size, Phylogeny, and Ambient Noise in the Evolution of Bird Song. The American Naturalist, 126(1), 87–100. https://doi.org/10.1086/284398

Distribution of peak song frequency across a passerine phylogeny. Source: Mikula et al. 2020

5 Automated bird song analysis

With thanks to Marcelo Araya Salas (author of the warbleR package. Go check his website and tutorials!).

While taking manual measurements is a reasonable option when dealing with small datasets and simple variables, most problems in ecology and animal behaviour call for more complex descriptions of sound, and involve datasets that are too large to be processed manually. Automated analyses are very useful but can be hard to implement, and are not free from many of the biases that plague manual characterisation of sound. For the third part of today’s session we will be exploring a simple case: a comparative analysis of the songs of two related species that are very common in the UK.

Both Great tits and Coal tits have relatively simple songs, most frequently made up of two alternating notes that produce what is often described as a ‘tea-cher, tea-cher’ sound. Although Coal tits are smaller than Great tits their songs have largely overlapping frequency ranges, and they can be confused by the untrained ear. The aim of this exercise is to ask whether an automated approach will identify objective ways to separate the songs of these two species.

$Coal tit (left) and Great tit (right). Images by Aviceda and Sue Cro, respectively. Under CC BY-SA 2.0 licence.$

Coal tit (left) and Great tit (right). Images by Aviceda and Sue Cro, respectively. Under CC BY-SA 2.0 licence.

Let’s get to work!

5.1 Preparing the data

First we need some data: we are going to get a sample of Coal and Great tit recordings from xeno-canto. You can read the comments in the code blocks to know what we are doing at each step - don’t worry if you don’t understand what every bit of code is doing!

# Path to the directory for this analysis
files_dir <- file.path(datadir, "two-species-files")

# Let's make use of a few more of your computer's cores to
# speed things up a bit:
ncores <- parallel::detectCores() - 2
# NOTE: If you have a windows OS you might get an error related to parallel
# computing. If this is the case, you can set the `ncores` variable to 1, which
# will disable parallel computing, by replacing the code line immediately above
# with `ncores = 1` and running this code chunk again.


# I have pre-selected some songs for you: import a vector containing
# their codes and download them.

xc_codes <- as.vector(unlist(read.csv(
  file.path(files_dir, "xc-codes.csv"),
  header = FALSE
)[1]))
invisible(pblapply(xc_codes, download_xc, dir = files_dir))

# Now we have to convert the .mp3 files to .wav,
# another commonly used audio file format.
# + check that the newly created wav files can be read
# + create a list with all the files that we are going to analyse

try(mp32wav(parallel = ncores, path = files_dir))
wave_files <- list.files(path = files_dir, pattern = ".wav$")

# Downsample the audio files to gain some speed
invisible(pblapply(wave_files, function(x) {
  writeWave(suppressWarnings(
    downsample(readWave(file.path(files_dir, x)), samp.rate = 22050)
  ),
  filename = file.path(files_dir, x)
  )
}))

# Remove mp3 files
invisible(pblapply(list.files(
  path = files_dir, pattern = ".mp3$",
  full.names = TRUE
), function(file) {
  file.remove(file)
}))

Now you can go to ./data/two-species-files, the directory for this part of the session, and inspect some of the files in Sonic Visualiser. I find that the songs of these two birds look more different than they sound - what do you think?

5.2 Finding notes in the data

The basic unit in these songs is a ‘note’, which we can define as each sound represented by a continuous trace on the spectrogram, often delimited by pauses. We can take advantage of these silences to try to detect each individual note in our dataset automatically. I have chosen particularly clean audio recordings (i.e., their signal-to-noise ratio is high) and set the parameters of the algorithm for you, so your segmentation should be fairly accurate. In reality, automatic segmentation of sound is a really hard problem!

# We are now going to run a simple algorithm that segments sounds
# based on their amplitude.It will not work perfectly,
# but is good enough four our purposes here
element_list <- auto_detec(
  path = files_dir,
  bp = c(2, 9),
  threshold = 4,
  mindur = 0.04,
  maxdur = 0.4,
  ssmooth = 200,
  wl = 512,
  parallel = ncores
)

# Let's export the segmented spectrograms and take a look at them:
dir.create(file.path(files_dir, "spectrograms"))
full_spectrograms(
  element_list,
  parallel = 1,
  fast.spec = TRUE, suffix = "seg", path = files_dir,
  dest.path = file.path(files_dir, "spectrograms")
)

# If you are happy with that, save the selections to a .csv
write.csv(element_list, file.path(datadir, "song-segmentation.csv"),
  row.names = FALSE
)

If you go to the spectrograms folder you can check whether the segmentation algorithm did a good job. You now have start and end time information for 1979 notes!

5.3 Extracting spectral parameters

Now that we have individual notes we can automatically extract some numeric descriptions of their sound. More specifically, we are going to measure 22 spectral parameters, such as the duration, mean frequency, standard deviation of the frequency, some measures of complexity, and range of frequencies. You can see a complete list here.

# Extract some spectral parameters
spectral_features <-
  specan(
    element_list,
    path = files_dir,
    parallel = ncores,
    threshold = 15,
    bp = c(2, 9)
  ) %>%
  as_tibble() %>%
  mutate(label = str_split(sound.files, "-|\\.", simplify = TRUE)[, 2])

You can run head(spectral_features) in your R console if you want to take a look at what these parameters look like.

So now we have some variables that describe different properties of the notes in our dataset — what can we do with them?

5.4 Principal component analysis

The first step when dealing with a complex dataset is often to try to reduce its dimensionality. This entails finding a simpler description that captures as much of the variation in the data as possible, ignoring those variables that are correlated between them and, therefore, redundant. There are many methods for dimensionality reduction, the simpler and most common of which is principal component analysis (PCA). This is a very neat visual explanation of how PCA works.

# PCA of spectral parameters
pca <-
  prcomp(
    x = spectral_features[, sapply(spectral_features, is.numeric)],
    center = TRUE,
    scale. = TRUE
  )

# Plot result
plot_pca(pca, spectral_features$label)

You have now plotted where each note in the dataset lies in a space defined by the two first principal components. They’re coloured by species.

If you navigate to ./reports/figures and open the pca-loadings.png image you can take a look at how each of the 22 variables contribute to the two first principal components, which capture 33% and 25% of the variation in the data, respectively.

Questions:
- Does this method help us distinguish the notes sung by these two bird species?
- Are our spectral parameters the best suited for the job?
- We have characterised some properties of individual notes; are other levels of description (for example, the way notes are arranged in a sequence) also relevant? How might we study these?

5.5 Automated species recognition

One of the methodological issues that arise when trying to compare the vocalisations of different species is that the variables that capture the most overall variation in the data might not be particularly relevant when it comes to distinguishing between them. If you open and compare some of the raw song recordings visually you might notice that, while there is a lot of overlap in the frequencies that they use, the ‘shape’ of their notes is somewhat different. As it turns out, most of the parameters that we have measured do not capture this!

Q: What do you think are the main differences between the songs of these two species? Which strategies do you think the birds might employ to tell whether they are listening to a conspecific?

To actually be able to predict if a song was sung by a Coal or a Great tit we need to train an algorithm to distinguish not those acoustic characteristics that explain most variation, but the right kind of variation: that which separates the two species in the ‘acoustic space’. Before we do that, let’s plot and discuss a couple of raw variables:

# I have handpicked a few variables that I intuitively think differ
# between our two species. Looking at the spectrograms, do you agree?

plot_durations(spectral_features)
plot_modulation(spectral_features)

5.5.1 Training a classification algorithm

We will now train a relatively simple machine learning algorithm called a Random Forest to see if we can predict which species sung a given note in our dataset. We will be using the same features that we extracted earlier.

# Prepare the data
# (remove non-numerical columns, convert species label to a factor)
data <- spectral_features[-c(1, 2)]
data$label <- as.factor(data$label)

# Check that the dataset is balanced:
summary(data$label)

# Split the dataset into two, one (80% of the data) we will use to train the
# algorithm, the other we will hold out and use to test its performance.
set.seed(123)
train <- sample(nrow(data), 0.8 * nrow(data), replace = FALSE)
TrainSet <- data[train, ]
ValidSet <- data[-train, ]

# Train model
rf_classifier <- randomForest(
  label ~ .,
  data = TrainSet,
  ntree = 1000, mtry = 10,
  importance = TRUE,
  proximity = TRUE,
  do.trace = TRUE
)

# Predicting species on the validation set
predValid <- predict(rf_classifier, ValidSet, type = "class")

The last step will be to evaluate the model we just trained:

# Evaluate the model
acc <- format(round(mean(predValid == ValidSet$label), 2), nsmall = 2)

evaluate <- function(actuals, predictions) {
  cf_mat <- table(actuals, predictions)
  out <- list(
    confusion_matrix = cf_mat,
    precision = cf_mat[2, 2] / sum(cf_mat[, 2]),
    prop_missed = cf_mat[2, 1] / sum(cf_mat[2, ]),
    accuracy = (cf_mat[1, 1] + cf_mat[2, 2]) / sum(cf_mat),
    true_positives = cf_mat[2, 2] / sum(cf_mat[2, ]),
    false_positives = cf_mat[1, 2] / sum(cf_mat[1, ])
  )
  return(out)
}

evaluate(ValidSet$label, predValid)

What can you say about the model’s performance? To help visualise it we can plot its confusion matrix, where you can see the number of true positives and negatives for each species that we got when tested our model on the unseen data:

Question:
- Why can’t we just train the algorithm using all available data?

# Plot confusion matrix
autoplot(conf_mat(table(predValid, ValidSet$label)), type = "heatmap") +
  bioac_theme() +
  theme(panel.background = element_blank()) +
  scale_fill_gradient(low = "white", high = "#4786a6") +
  theme(aspect.ratio = 1) +
  labs(
    x = "\nActual",
    y = "Predicted\n",
    title = "RF Confusion Matrix",
    subtitle = paste0("Accuracy: ", acc)
  )

Now let’s plot where each note lies in the similarity space defined by our model:

# What does this look like if we plot it?
dist_matrix <- as.dist(1 - rf_classifier$proximity)
mds_out <- cmdscale(dist_matrix, k = 2, eig = TRUE, x.ret = TRUE)
mds_varexp <- round(mds_out$eig / sum(mds_out$eig) * 100, 1)

mds_data <- mds_out$points %>% tibble(
  note = rownames(.),
  X = .[, 1],
  Y = .[, 2],
  species = TrainSet$label
)

ggplot(data = mds_data, aes(x = X, y = Y, color = species)) +
  geom_point(alpha = 0.8, shape = 16) +
  scale_colour_manual(
    values = c("#585955", "#f0b618"),
    labels = c("Coal tit", "Great tit")
  ) +
  ggtitle("MDS plot using (1 - Random Forest Proximities)") +
  bioac_theme() +
  theme(legend.position = "right", aspect.ratio = 0.6, ) +
  labs(
    x = paste("\nMDS1 - ", mds_varexp[1], "%", sep = ""),
    y = paste("MDS2 - ", mds_varexp[2], "%\n", sep = ""),
    title = "MDS plot",
    subtitle = "(1 - RF distances)\n"
  )

And finally, let’s check which variables contribute the most to the prediction:

varImpPlot(rf_classifier)

We are done now! Feel free to keep this document and, if this is something you have found interesting, you can reuse and modify the code however you wish — maybe try to analyse the vocalisations of your favourite species.