Abstract
The cao vit gibbon (Nomascus nasutus) is one of the rarest primates on Earth and now only survives in a single forest patch of less than 5000 ha on the Vietnam–China border. Accurate monitoring of the last remaining population is critical to inform ongoing conservation interventions and track conservation success over time. However, traditional methods for monitoring gibbons, involving triangulation of groups from their songs, are inherently subjective and likely subject to considerable measurement errors. To overcome this, we aimed to use ‘vocal fingerprinting’ to distinguish the different singing males in the population. During the 2021 population survey, we complemented the traditional observations made by survey teams with a concurrent passive acoustic monitoring array. Counts of gibbon group sizes were also assisted with a UAV-mounted thermal camera. After identifying eight family groups in the acoustic data and incorporating long-term data, we estimate that the population was comprised of 74 individuals in 11 family groups, which is 38% smaller than previously thought. We have no evidence that the population has declined—indeed it appears to be growing, with new groups having formed in recent years—and the difference is instead due to double-counting of groups in previous surveys employing the triangulation method. Indeed, using spatially explicit capture-recapture modelling, we uncovered substantial measurement error in the bearings and distances from field teams. We also applied semi- and fully-automatic approaches to clustering the male calls into groups, finding no evidence that we had missed any males with the manual approach. Given the very small size of the population, conservation actions are now even more urgent, in particular habitat restoration to allow the population to expand. Our new population estimate now serves as a more robust basis for informing management actions and tracking conservation success over time.
Similar content being viewed by others
The Critically Endangered cao vit gibbon (Nomascus nasutus) is one of the rarest primates on Earth, with only a single, small population remaining on the Vietnam-China border. It now occurs in just one forest block totalling 4839 ha, not all of which is suitable habitat for the gibbon. Conservation measures have been in place for the species since it was rediscovered in 2002 in Vietnam1 and re-confirmed in China in 20062, including: the establishment of two protected areas; regular patrolling by rangers and community groups; habitat restoration; support for sustainable livelihoods; awareness-raising about the plight of the gibbon, and educational events with local schools3.
Alongside these activities, periodic surveys of the cao vit gibbon population have been done, to inform management decisions. Specifically, surveys have provided data for population viability analyses, informed prioritisation of different conservation interventions, and helped to track the impact of interventions over time. Surveys to date have estimated a population size of around 120 individuals, with the lowest and highest estimate of 109 (in 2018) and 137 (in 2012), respectively3,4. However, for a long time, surveyors have been aware of substantial subjectivity inherent in population survey methods for gibbons5. Most gibbon surveys to date, including all those done for the cao vit gibbon, have estimated density or abundance by triangulating group locations from multiple survey posts that are monitored simultaneously6,7. Gibbons are sometimes observed directly but, most often, are detected indirectly from their songs. Crucially, the triangulation method depends on being able to reliably match gibbon groups across different surveyor teams, based on reported bearings and distances from surveyors, and the recorded start and end times of any songs. Surveys typically occur over multiple days (to ensure no gibbon groups are missed) and so gibbon groups must also be matched successfully from one day to the next6,7. If two groups sing close together at different times, there is a risk that they are identified as one group and the total population size is under-estimated. Equally, if a single group moves quickly to a new location and sings again, or moves a far distance between survey days, there is a risk that the group is identified erroneously as two groups, and the total population size is over-estimated. In addition, distances and bearings are difficult to estimate in the field and are likely associated with substantial (and unquantified) error. This is especially likely to be the case in the complex topography of the cao vit gibbon’s karst habitat. Due to these factors, we suspect previous cao vit gibbon population estimates have suffered from bias.
For the cao vit gibbon, an indicator of this bias has long existed: a discrepancy between the population survey data from Vietnam and China. In China, researchers have been intensively following gibbon groups since 20078,9 and have a detailed understanding of their home ranges; this was not the case in Vietnam until focal group monitoring began finally in 20203. The monitoring data from China revealed that exclusive home ranges (excluding overlaps) were approximately 100 ha10 and density was 1.0 groups per km2 (five groups occupying 4.9 km2). However, inferred home ranges and densities from Vietnam-only groups are 26 ha and 3.8 groups per km2, respectively (16 groups occupying 4.2 km2; based on 2018 population survey data11). Unless the resource availability of the occupied area on the Vietnam side of the border is substantially higher (for which there is no clear evidence3,12), these numbers suggest an over-estimation of the population in Vietnam.
A key characteristic of the family Hylobatidae is their singing behaviour. Like in other gibbon species, cao vit gibbon family groups sing during most mornings, with the male providing four main phrase types—‘staccato’, ‘boom’, ‘multi-modulated’ and ‘coda’—and the female contributing the ‘great call’ (Fig. 1)13. In multi-female groups, which are the norm in the cao vit gibbon (all of the studied groups in China had two adult females14,15), the females will synchronise their great calls. Community monitoring teams in Vietnam, as well as research teams in China, have long reported that individual gibbon groups have distinctive songs. Gibbon species within the genus Hylobates have also long been known to show individuality in their songs, in particular in the female great call, and gibbons may themselves use this information to decide whether to escalate conflicts16,17. This has been studied far less in the genus Nomascus, but work on the western black crested gibbon (N. concolor) and indeed the cao vit gibbon has shown that songs, in particular the male songs, are highly individualistic and stable over time17,Spatially explicit capture-recapture (SECR) analysis We input the distance and bearing measurements made by field teams into SECR analysis25. This allowed us to: (i) estimate the location of each calling group by reconciling the bearing and distance measurements from multiple teams; (ii) evaluate errors in the estimation of distances and bearings, (iii) characterise the detection function, which describes how detection probability declines as a function of distance, and (iv) identify drivers of variation in song density (i.e., songs per hour of survey per km2). We aimed to feed the results into the design of ongoing long-term monitoring, as well as the design of future surveys. By modelling song density, we also aimed to improve our understanding of the gibbon’s ecology and identify potential areas for protection and-or restoration. SECR can also be an effective method for estimating gibbon group density when surveys cover an unknown proportion of groups inhabiting an area. We did not exploit this here, given that we were able to survey the entire cao vit gibbon population. Using the R package ‘ascr’ v. 2.2.426, we ran two competing models for song density, each representing a distinct hypothesis about the factors determining where gibbons sing from. The forest structure model contained the variables tree canopy cover27 and canopy height28, whilst the accessibility model contained the variables distance from forest edge and elevation29. A quadratic term for elevation was also included in the accessibility model, to allow for non-linear responses to this covariate. In both models, we also included a binary variable for whether it rained or not during the survey day. We selected the best model on the basis of Akaike’s Information Criterion corrected for small sample size (AICc). We used a half-normal detection function with the intercept parameter (g0) fixed at one; preliminary modelling with no covariates indicated that this was clearly preferred over a hazard detection function (ΔAICc = − 21.5). Models were integrated over a grid of points overlaying the Trung Khanh—Bangliang forest block with 100 m spacing. Estimated locations for each gibbon song were output from the best model and, for each group with more than five locations, 95% kernel home ranges were calculated using the R package ‘ctmm’ v. 1.1.0 with default settings30. We consider these kernel home range estimates as indicative only, given the small sample sizes. For each song bout detected by field teams, we extracted corresponding recordings from the passive acoustic data using an automatic script in R v. 4.2.131. These were then manually imported into Raven Pro v. 1.6 (Cornell Lab of Ornithology, USA), listened to, and visually inspected in the form of a spectrogram with a 1200-point Hann window (70% overlap) and a 2048-point Discrete Fourier Transform; brightness and contrast were initially set to 55 and 70, respectively, and adjusted if necessary. We found that individual male gibbons were readily-identifiable from recordings, even those from relatively poor-quality recordings (i.e., signal-to-noise ratios < 3 dB). Cao vit gibbon groups have a single adult male according to long-term monitoring data14; they alone contribute the male portion of the duets, although sub-adult males within a group may sing solo songs14. We therefore were able to assign each song bout to a particular group based on the male’s unique vocal fingerprint. We also took into consideration the location of each recording when determining the identity of a group, but this information was far less useful in most cases than the spectrogram. In addition to the manual vocal fingerprinting, we also explored more automated approaches to identifying how many males were present in the acoustic data. Specifically, we extracted measurements from each male phrase and then statistically clustered the data into groups. The number of resulting groups should equal the number of males present in the data. This more objective and repeatable approach was intended to complement the expert-driven manual clustering. Agreement between the two approaches might provide greater confidence in the overall population estimate. To provide the data for clustering, we annotated the male multi-modulated phrases in each song bout recording with bounding boxes (Fig. 1). We only began annotating a given song bout once the male had started calling with the fully-developed form of his multi-modulated phrase (typically 8–10 min after the first call). Here, we did not annotate coda, staccato or boom phrases, although these may also encode information that is unique to each male. We also did not focus on female phrases, since previous work suggests that, for Nomascus gibbons, it is the males that are more easily individually identifiable14), which do not sing and are difficult to survey using any method. Long-term monitoring in China observed approximately one solitary male and female for each year of data14, suggesting low densities and-or low detection probabilities. We incorporated uncertainty in \(\widehat{s}\) using a Monte Carlo approach, wherein we drew samples (with replacement) from the observed group size counts to represent the unobserved groups and recalculated \(N\) (n = 9999 simulations). A confidence interval on \(\widehat{N}\) was obtained by taking the 95% quantiles over the resulting vector. Finally, we produced our ‘best’ estimate of the cao vit gibbon population, by incorporating group size information from long-term monitoring in Vietnam and China. In this case, we had group size information for all groups, so our estimate is considered exact (and therefore has no associated measure of uncertainty).Vocal fingerprinting
Manual clustering
Semi-supervised and unsupervised clustering
Results
Sampling effort
Between 26th October and 10th November 2021, we surveyed 29 listening posts across Vietnam and China, involving a total of 61 field personnel (42 in Vietnam and 19 in China) and 11 satellite campsites. Listening posts were surveyed for an average of 4.7 survey days (range: 1 to 9 days) and 28.9 h (range: 6 to 55 h). This generated 245 records of gibbons from field teams (Supplementary Figs. S1.1–S1.9). Following matching of data across survey teams, combined with vocal fingerprinting (see below), these records were deemed to have involved 49 song duets, 24 male solo songs and 28 direct observations. Most songs (55%) occurred in a 60 min period centred on sunrise (Fig S1.10).
The passive acoustic survey had a survey effort of 3480 days (1499 and 1981 in the first and second phases, respectively), generating more than 25,000 h of recordings. Each location was surveyed for an average of 63 days (range 4–84 days) and 471 h (range 29–632 h) over the two phases. Deployment and retrieval of devices during each phase involved a total of 11 personnel over 10 survey days.
Measurement error and song density estimation
SECR modelling revealed substantial human error associated with the distance and bearing measurements (Fig. 3a,b). For example, for a gibbon calling 500 m away, the 95% confidence interval (CI) on the distance measurement was estimated as 201–933 m. For a gibbon calling at 1000 m, the 95% CI was 402 to 1865 m. The 95% CI on bearing measurements was ± 41°. Surveyors had a greater than 90% chance of detecting singing gibbons at distances less than 330 m, but this had declined to 50% by 860 m (Fig. 3c).
Estimated measurement error and detection range of surveyor teams when recording cao vit gibbon songs, as estimated using spatially explicit capture-recapture modelling. Measurement error is composed of (A) bearing error and (B) distance error, the latter of which is magnified when gibbons are far away. The detection range (C) was estimated using a half-normal detection function. Dashed lines indicate 95% confidence intervals.
Song density was better described by forest accessibility, as measured using distance from forest edge and elevation, than forest structure (ΔAICc = − 38.3; Table 1); it was also better than a null model with no covariates (ΔAICc = − 52.5). Forest that was further from the forest edge and at mid-elevation (i.e., on the mountain slopes, not on the highest peaks nor in the valleys) had substantially higher song density, with the confidence intervals on all parameters not overlap** zero (Table 1). Song density was also lower on rainy survey days, though this effect was not statistically clear, with the confidence interval on the parameter overlap** zero (Table 1); indeed, gibbons were recorded singing on days when it rained. Gibbons were predicted to sing mostly within a highly restricted area of the forest, greater than 1 km from the forest edge and between 750 and 790 m elevation (Supplementary Figs. S1.11–S1.12).
Manual and automatic clustering of the song bouts
By matching the song bout data from field teams with the acoustic data, we were able to extract recordings of 55 song bouts, over which we annotated 940 multi-modulated phrases. We deduced from the field data that two of the song bouts were of a group (‘GL’) located far on the Chinese side of the border, more than 1 km from our devices; we excluded these from further analysis due to the low quality of the recordings.
For the remaining 53 song bouts, manual clustering revealed nine different singing males across eight family groups (Fig. 4). One of the groups, G2, was represented by two males: an adult and putative sub-adult. The sub-adult often sang alone and was only ever heard singing with the G2 females alongside the adult male. It is also possible that this individual was an unmated adult male challenging the established G2 male. Monitoring of G2 for a longer period than we could achieve during our survey would help to clarify the relationship between this sub-adult and group G2. In any case, the number of singing family groups remains eight. Five other male solo songs were recorded but only involved simple, undeveloped phrases that could not be identified. These songs were excluded from further analysis but, based on the locations of these solo songs, we estimate that they involved three different males; solitary dispersing males are not thought to sing14, so it is likely that these males were part of known family groups. Each of the nine manually-identified clusters were discernible in a UMAP plot of the acoustic features (Fig. 5b,d).
Example multi-modulated phrases for the nine manually-identified male cao vit gibbons. Spectrograms were created with the ‘warbleR’ package v. 1.1.28 in R47, using a 1200-point Hann window with 70% overlap.
Clustering of the cao vit gibbon male calls based on affinity propagation (A, C) and compared to a manual approach (B, D). Each male phrase is plotted in two-dimensional space by applying Uniform Manifold Approximation and Projection (UMAP) dimension-reduction to 181 different measurements. Top row: Semi-supervised affinity propagation applied to the ‘representative’ phrases (A) as compared to the manual clustering (B). Bottom row: Unsupervised affinity propagation applied to the full dataset (C) as compared to a manual approach (D).
Affinity propagation clustering using the semi-supervised approach returned eight clusters across seven family groups (Fig. 5a), whilst the unsupervised approach returned seven clusters across seven family groups (Fig. 5c). Compared to the manual identification, the semi-supervised and unsupervised approaches lumped together groups TCN and G4. The unsupervised approach also did not resolve the G2 adult male as a distinct cluster and the G2 sub-adult male instead represented group G2 in this case.
Population size estimation
On the basis of the manual clustering and the group size counts made during the survey (direct observations and UAV videos), we estimate that there were 11 cao vit gibbon family groups, comprised of 76 individuals (95% CI: 74–78; Table 2). By incorporating long-term monitoring data to fill in data gaps about group sizes (for groups Q, G4 and GL), we estimate that there were in fact 74 individuals in family groups. The semi-supervised and unsupervised approaches both produced an estimate of 10 family groups, comprised of 69 individuals in both cases (95% CI: 67–70). Three of the groups primarily resided in China and eight in Vietnam, with considerable overlap observed in the home ranges of the southern-most groups in Vietnam (Fig. 6).
Gibbon detection locations (songs and sightings) and minimum convex polygons for groups with sufficient data (n ≥ 5). Song locations were estimated using spatially explicit capture-recapture modelling, based on distance and bearing measurements from survey teams, whilst direct observations were mapped in the field based on topographical landmarks. The inset map shows the broader landscape surrounding the Bangliang—Trung Khanh forest block (Google Earth basemap).
Group compositions observed during the survey were consistent with those from long-term monitoring in China, with most groups consisting of one adult male, two adult females and dependent offspring (Table 2). The exception was group R, which apparently had three adult females (though, consistent with our understanding of the species, this is unlikely to be stable in the long-term14).
Discussion
We carried out the most robust survey of the last remaining cao vit gibbon population done to date, incorporating vocal fingerprinting and UAV-based group counts, finding that the population is substantially smaller than previously thought. Instead of approximately 120 individuals, the population is 38% smaller, at 74 individuals (plus an unknown number of solitary dispersing individuals). Semi-supervised and unsupervised approaches to clustering the acoustic data yielded a slightly smaller population size estimate of 69 individuals, with no evidence that the manual identifications had missed any groups.
Previous surveys, we believe, have unwittingly over-estimated the population size by occasionally double-counting groups when they sang in new locations (either on the same day or subsequent days). The double-counting problem has likely been exacerbated by the substantial measurement error in estimated distances and bearings, causing inaccurate localisation of singing gibbons. We saw evidence of this double-counting during our survey, and instead relied on the acoustic data to decide when to split or combine records. We have no evidence that the discrepancy in estimates between this latest survey and previous surveys is due to a population decline, with no hunting reported over the last 20 years and the habitat undergoing recovery since at least 20073. Indeed, we believe the population has likely increased over this time, with two new groups having formed in China (in 2015 and 2017) in previously unoccupied habitat15,39.
The cao vit gibbon is evidently in much more immediate danger from small population size effects—including loss of genetic diversity, inbreeding, and vulnerability to unforeseen catastrophes—than previously thought3. There are at least three implications of this new understanding of the population size. First, there is now an even greater urgency to the ongoing habitat restoration work, likely the most feasible way to increase the population size over the near-to-medium term. Habitat restoration in limestone forest has proven challenging to date, owing to the unique ecology of succession in this habitat and the difficult access to the site40. Fauna & Flora are currently trialling new methods, such as soil transplants in rocky areas and cluster planting in valley bottoms, but additional expertise and resources are needed. In China, too, habitat restoration efforts are underway3 and the potential gains for the population are even larger than in Vietnam: as much as 84% of the forest block remains unoccupied by gibbons on the Chinese side of the border, compared to 73% in Vietnam (based on a minimum convex polygon around the detection locations and long-term data9). This difference is largely due to the highly degraded state of the habitat in the extreme northwest of the forest block in China12.
The second implication of the new, substantially smaller population estimate is that there is an even stronger rationale to continue the monitoring of focal groups into the long-term, as an early-warning system to detect inbreeding depression (as might be indicated, for example, by changes in infant mortality rates or female breeding rates). Long-term monitoring also appears to provide de facto protection for monitored groups, due to the regular presence of monitoring teams in the field.
The third implication of discovering the very small size of the gibbon population is that it calls into question current plans to reintroduce the species to an additional site3, since any removal of individuals from the population may endanger its persistence more than previously thought. Careful study of the vital rates (breeding and mortality) of the existing population, as well as updated scenario modelling using population viability analysis (PVA)3,12, will be needed in order to assess the feasibility and risks associated with this. As a crucial input to the PVA modelling, the genetic health of the population must urgently be assessed, as has been done for example for the Hainan gibbon (N. hainanus)41. The genetic assessment would also help to define the time-line over which conservation actions must occur.
We consider it unlikely that we have missed any groups during our survey, due to the high sampling effort in the field, specifically the high density and coverage of listening points used. Unsurveyed areas within the Bangliang—Trung Khanh forest block, for example to the north and south of known gibbon groups (Fig. 6), are unlikely to harbour any further groups, with patrols and local communities regularly visiting these areas but never reporting gibbon songs or sightings. It is possible, though, that these areas harbour dispersing gibbons, which are typically silent and therefore highly cryptic. Our new estimate is also more plausible than previous estimates, since it resolves the discrepancy in the population data from China and Vietnam. The estimated density in Vietnam of 1.4 groups per km2 (six Vietnam-only groups occupying 4.4 km2; calculated from a minimum convex polygon around the detection locations) is now more in line with the density from long-term monitoring in China of 1.0 groups per km2.
The main caveat to our vocal fingerprinting approach is that it may fail to distinguish individuals if songs show a high degree of similarity. Indeed, the semi-supervised and unsupervised approaches both lumped together some males that the manual process determined were separate. We consider it unlikely, however, that the manual approach missed any males. As well as the multi-modulated phrases, the manual approach also used information from the staccato phrases (which appeared to differ in shape and peak frequency between males) and geographic location and timing of calls. Previous studies of Nomascus18,19 and Hylobates21 have also not uncovered any evidence of cryptic individuals. Nonetheless, it may be beneficial to apply more sensitive approaches to classifying the acoustic data than affinity propagation, in particular convolutional neural networks42. These algorithms rely on manually-generated training data, but songs which the algorithm finds difficult to classify (i.e., with low class probabilities) might indicate the presence of cryptic individuals. These songs could be flagged and investigated further.
SECR has been applied to relatively few gibbon species thus far25,43, perhaps in part due to the statistical complexity of the approach relative to traditional triangulation methods. However, the SECR approach provides a rigorous way of reconciling bearing and distance measurements from different surveyor teams and estimating the location of singing gibbons. It also quantifies the magnitude of the measurement errors which, for the cao vit gibbon, we found were substantial (Fig. 3), most likely due to the complex way in which sound travels in the karst mountain landscape. SECR modelling can also be used to test hypotheses, such as the drivers determining singing locations. For the cao vit gibbon, we found that highly specific locations within the Bangliang—Trung Khanh forest are favoured for singing: upper mountain slopes far from the forest edge (Supplementary Fig. S1.12). The elevation effect is similar to findings from intensive, long-term monitoring of two cao vit gibbon groups in China, which found that gibbons preferred higher elevations for singing9. In principle, SECR could also be used with capture-recapture data from passive acoustic monitoring arrays, outputting the estimated locations of singing gibbons. For accurate localisation, however, supplementary information is likely needed, such as bearings (e.g. from a recorder capable of carrying out beamforming), signal strength and-or time-of-arrival (e.g. from devices that are time-synchronised using GPS)44,45,46.
Our approach in the 2021 survey, which combined traditional methods with emerging, technology-based methods, paves the way for a new era in monitoring of the cao vit gibbon, and indeed other gibbon species. Gibbon surveys employing vocal fingerprinting should be more accurate and comparable across time. Moreover, since male Nomascus songs are stable over long time-frames18,19, it is likely possible to match individual males over consecutive surveys and detect male replacement events at the population scale19. In the long-term, we might even envisage a solar-powered, time-synchronised acoustic array, combined with edge processing and cellular connectivity, that could autonomously map the songs of different males and send the information remotely. With this, we could move away from snapshot population surveys and instead monitor the population continuously across space and time. This would provide conservation managers with an unprecedented level of detail about gibbon populations and allow for more effective and timely decision-making.