The Critically Endangered cao vit gibbon (Nomascus nasutus) is one of the rarest primates on Earth, with only a single, small population remaining on the Vietnam-China border. It now occurs in just one forest block totalling 4839 ha, not all of which is suitable habitat for the gibbon. Conservation measures have been in place for the species since it was rediscovered in 2002 in Vietnam1 and re-confirmed in China in 20062, including: the establishment of two protected areas; regular patrolling by rangers and community groups; habitat restoration; support for sustainable livelihoods; awareness-raising about the plight of the gibbon, and educational events with local schools3.

Alongside these activities, periodic surveys of the cao vit gibbon population have been done, to inform management decisions. Specifically, surveys have provided data for population viability analyses, informed prioritisation of different conservation interventions, and helped to track the impact of interventions over time. Surveys to date have estimated a population size of around 120 individuals, with the lowest and highest estimate of 109 (in 2018) and 137 (in 2012), respectively3,4. However, for a long time, surveyors have been aware of substantial subjectivity inherent in population survey methods for gibbons5. Most gibbon surveys to date, including all those done for the cao vit gibbon, have estimated density or abundance by triangulating group locations from multiple survey posts that are monitored simultaneously6,7. Gibbons are sometimes observed directly but, most often, are detected indirectly from their songs. Crucially, the triangulation method depends on being able to reliably match gibbon groups across different surveyor teams, based on reported bearings and distances from surveyors, and the recorded start and end times of any songs. Surveys typically occur over multiple days (to ensure no gibbon groups are missed) and so gibbon groups must also be matched successfully from one day to the next6,7. If two groups sing close together at different times, there is a risk that they are identified as one group and the total population size is under-estimated. Equally, if a single group moves quickly to a new location and sings again, or moves a far distance between survey days, there is a risk that the group is identified erroneously as two groups, and the total population size is over-estimated. In addition, distances and bearings are difficult to estimate in the field and are likely associated with substantial (and unquantified) error. This is especially likely to be the case in the complex topography of the cao vit gibbon’s karst habitat. Due to these factors, we suspect previous cao vit gibbon population estimates have suffered from bias.

For the cao vit gibbon, an indicator of this bias has long existed: a discrepancy between the population survey data from Vietnam and China. In China, researchers have been intensively following gibbon groups since 20078,9 and have a detailed understanding of their home ranges; this was not the case in Vietnam until focal group monitoring began finally in 20203. The monitoring data from China revealed that exclusive home ranges (excluding overlaps) were approximately 100 ha10 and density was 1.0 groups per km2 (five groups occupying 4.9 km2). However, inferred home ranges and densities from Vietnam-only groups are 26 ha and 3.8 groups per km2, respectively (16 groups occupying 4.2 km2; based on 2018 population survey data11). Unless the resource availability of the occupied area on the Vietnam side of the border is substantially higher (for which there is no clear evidence3,12), these numbers suggest an over-estimation of the population in Vietnam.

A key characteristic of the family Hylobatidae is their singing behaviour. Like in other gibbon species, cao vit gibbon family groups sing during most mornings, with the male providing four main phrase types—‘staccato’, ‘boom’, ‘multi-modulated’ and ‘coda’—and the female contributing the ‘great call’ (Fig. 1)13. In multi-female groups, which are the norm in the cao vit gibbon (all of the studied groups in China had two adult females14,15), the females will synchronise their great calls. Community monitoring teams in Vietnam, as well as research teams in China, have long reported that individual gibbon groups have distinctive songs. Gibbon species within the genus Hylobates have also long been known to show individuality in their songs, in particular in the female great call, and gibbons may themselves use this information to decide whether to escalate conflicts16,17. This has been studied far less in the genus Nomascus, but work on the western black crested gibbon (N. concolor) and indeed the cao vit gibbon has shown that songs, in particular the male songs, are highly individualistic and stable over time17,Spatially explicit capture-recapture (SECR) analysis

We input the distance and bearing measurements made by field teams into SECR analysis25. This allowed us to: (i) estimate the location of each calling group by reconciling the bearing and distance measurements from multiple teams; (ii) evaluate errors in the estimation of distances and bearings, (iii) characterise the detection function, which describes how detection probability declines as a function of distance, and (iv) identify drivers of variation in song density (i.e., songs per hour of survey per km2). We aimed to feed the results into the design of ongoing long-term monitoring, as well as the design of future surveys. By modelling song density, we also aimed to improve our understanding of the gibbon’s ecology and identify potential areas for protection and-or restoration. SECR can also be an effective method for estimating gibbon group density when surveys cover an unknown proportion of groups inhabiting an area. We did not exploit this here, given that we were able to survey the entire cao vit gibbon population.

Using the R package ‘ascr’ v. 2.2.426, we ran two competing models for song density, each representing a distinct hypothesis about the factors determining where gibbons sing from. The forest structure model contained the variables tree canopy cover27 and canopy height28, whilst the accessibility model contained the variables distance from forest edge and elevation29. A quadratic term for elevation was also included in the accessibility model, to allow for non-linear responses to this covariate. In both models, we also included a binary variable for whether it rained or not during the survey day. We selected the best model on the basis of Akaike’s Information Criterion corrected for small sample size (AICc). We used a half-normal detection function with the intercept parameter (g0) fixed at one; preliminary modelling with no covariates indicated that this was clearly preferred over a hazard detection function (ΔAICc =  − 21.5). Models were integrated over a grid of points overlaying the Trung Khanh—Bangliang forest block with 100 m spacing. Estimated locations for each gibbon song were output from the best model and, for each group with more than five locations, 95% kernel home ranges were calculated using the R package ‘ctmm’ v. 1.1.0 with default settings30. We consider these kernel home range estimates as indicative only, given the small sample sizes.

Vocal fingerprinting

Manual clustering

For each song bout detected by field teams, we extracted corresponding recordings from the passive acoustic data using an automatic script in R v. 4.2.131. These were then manually imported into Raven Pro v. 1.6 (Cornell Lab of Ornithology, USA), listened to, and visually inspected in the form of a spectrogram with a 1200-point Hann window (70% overlap) and a 2048-point Discrete Fourier Transform; brightness and contrast were initially set to 55 and 70, respectively, and adjusted if necessary. We found that individual male gibbons were readily-identifiable from recordings, even those from relatively poor-quality recordings (i.e., signal-to-noise ratios < 3 dB). Cao vit gibbon groups have a single adult male according to long-term monitoring data14; they alone contribute the male portion of the duets, although sub-adult males within a group may sing solo songs14. We therefore were able to assign each song bout to a particular group based on the male’s unique vocal fingerprint. We also took into consideration the location of each recording when determining the identity of a group, but this information was far less useful in most cases than the spectrogram.

Semi-supervised and unsupervised clustering

In addition to the manual vocal fingerprinting, we also explored more automated approaches to identifying how many males were present in the acoustic data. Specifically, we extracted measurements from each male phrase and then statistically clustered the data into groups. The number of resulting groups should equal the number of males present in the data. This more objective and repeatable approach was intended to complement the expert-driven manual clustering. Agreement between the two approaches might provide greater confidence in the overall population estimate.

To provide the data for clustering, we annotated the male multi-modulated phrases in each song bout recording with bounding boxes (Fig. 1). We only began annotating a given song bout once the male had started calling with the fully-developed form of his multi-modulated phrase (typically 8–10 min after the first call). Here, we did not annotate coda, staccato or boom phrases, although these may also encode information that is unique to each male. We also did not focus on female phrases, since previous work suggests that, for Nomascus gibbons, it is the males that are more easily individually identifiable14), which do not sing and are difficult to survey using any method. Long-term monitoring in China observed approximately one solitary male and female for each year of data14, suggesting low densities and-or low detection probabilities.

We incorporated uncertainty in \(\widehat{s}\) using a Monte Carlo approach, wherein we drew samples (with replacement) from the observed group size counts to represent the unobserved groups and recalculated \(N\) (n = 9999 simulations). A confidence interval on \(\widehat{N}\) was obtained by taking the 95% quantiles over the resulting vector. Finally, we produced our ‘best’ estimate of the cao vit gibbon population, by incorporating group size information from long-term monitoring in Vietnam and China. In this case, we had group size information for all groups, so our estimate is considered exact (and therefore has no associated measure of uncertainty).