Abstract
We present an ocean-basin-scale dataset that includes tail fluke photographic identification (photo-ID) and encounter data for most living individual humpback whales (Megaptera novaeangliae) in the North Pacific Ocean. The dataset was built through a broad collaboration combining 39 separate curated photo-ID catalogs, supplemented with community science data. Data from throughout the North Pacific were aggregated into 13 regions, including six breeding regions, six feeding regions, and one migratory corridor. All images were compared with minimal pre-processing using a recently developed image recognition algorithm based on machine learning through artificial intelligence; this system is capable of rapidly detecting matches between individuals with an estimated 97–99% accuracy. For the 2001–2021 study period, a total of 27,956 unique individuals were documented in 157,350 encounters. Each individual was encountered, on average, in 5.6 sampling periods (i.e., breeding and feeding seasons), with an annual average of 87% of whales encountered in more than one season. The combined dataset and image recognition tool represents a living and accessible resource for collaborative, basin-wide studies of a keystone marine mammal in a time of rapid ecological change.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Introduction
Understanding the population ecology of a species is crucial for conservation management, but studies of most migratory marine species are compromised by data deficiency. Individual identification through techniques such as photographic identification (photo-ID), radio telemetry, and genetic sequencing allow researchers to track individual animals over time. This enables population modeling, revealing movement patterns, social interactions, and reproductive success rates. Photo-ID, in which a photograph of persistently identifiable features of an individual is recorded together with its date and location, offers an efficient and non-invasive data collection method1. For long-lived migratory species, effective population studies require extensive data collection, including the additional challenges of collaboration across regional and international boundaries.
The humpback whale, Megaptera novaeangliae, is a globally distributed baleen whale species with a complex population structure and major ecosystem impacts2,3,4. Individuals engage in extensive seasonal migrations between high-latitude feeding areas during the spring, summer, and fall, and low-latitude tropical waters to mate and calve in winter and spring2,4,5,6,7,8. The long-distance migrations undertaken by humpback whales expose populations to diverse management regimes, anthropogenic risks, and ecological conditions9. For example, a very large marine heatwave in the North Pacific from late 2013–201610,11,12 caused major negative impacts on humpback whale food resource availability. This resulted in sharp declines in abundance, survival, and reproductive success of humpback whales in Hawaiʻi and Southeast Alaska13,14,15,16,17. In a changing oceanic ecosystem, a cost-effective and non-invasive technique that repeatedly samples most living individuals can offer valuable insights into the status of the species and its ecosystem.
Humpback whale populations worldwide were severely depleted by extensive commercial whaling until late in the twentieth century. This species was listed under the U.S. Endangered Species Act (ESA) in 1970 due to an estimated 31,785 killed in the North Pacific from 1900 to 197918,19,20. Following a global ban on humpback whale catches by the International Whaling Commission in 1966, and the cessation of Soviet illegal whaling in the following decade19, the humpback whale population has grown. Two studies have evaluated the abundance of humpback whales in the full North Pacific: first in the 1990s4, then a study entitled Structure of Populations, Levels of Abundance and Status of Humpback Whales (SPLASH) conducted from 2004 to 20068. These studies estimated total North Pacific humpback whale abundance at 21,063 individuals in 2006, with an annual growth rate of 8.1% between the two study periods21. A major portion of SPLASH relied on the identification and resighting of individual humpback whales through photo-ID. This method involved trained observers visually matching photographs of the ventral side of each whale’s tail (flukes) based on unique white and black pigmentation patterns, together with unique fluke trailing edge contours22,23. SPLASH documented 7,640 individual humpback whales in 18,469 unique encounters (defined as a single sighting of a unique individual supported by a referenced photo-ID image, recorded on a specific day at a specific location); these encounters occurred across all known breeding and feeding areas. SPLASH reinforced the value of broad-scale data sharing and collaboration, and exposed gaps in knowledge of humpback whale status in the North Pacific.
In 2016, NOAA Fisheries, pursuant to the ESA, defined 14 humpback Distinct Population Segments (DPSs) globally using photo-ID data and other lines of evidence24. DPS designations are based on theoretically discrete breeding areas where many whales show long-term site fidelity25. In feeding areas, whales also show high site fidelity and arguably face greater biological and anthropogenic stressors26. Four DPSs occur in the North Pacific, with breeding occurring in waters off Central America, Mexico, Hawaiʻi, and the Western North Pacific (Mariana Islands, the Philippines, and Japan). Based on varying rates of recovery, the Central America and Western North Pacific DPS remain listed as Endangered (s), the Mexico DPS is considered Threatened, and the Hawaiʻi DPS has been deemed to not warrant listing27. Ironically, removal of the Hawaiʻi DPS's endangered status by the US coincided with the 2013–2016 marine heatwave that negatively affected population health13,14,15,16,28.
Individual photo-ID data have advanced the understanding of humpback whale behavior, ecology and conservation issues based on many regional study efforts13,14,25,29,30,31,32,33,34,35,36,37,38,39,40,41. However, after SPLASH ended in 2006, local and regional photo-ID datasets were seldom integrated with one another. This was in part due to prohibitively time-intensive visual matching of individual ID fluke photos in ever-growing catalogs. The current study established the North Pacific Humpback Whale Photo-ID (NPPID) collaboration. The goal of this collaboration was to integrate and advance knowledge of humpback whale population structure and migratory movement in the North Pacific through creation of a shared repository of resighting data for individual whales across the full study region. A central objective of the effort was to implement a collaborative framework to facilitate data availability, access, and readiness. Given the large amount of data involved and the difficulty of obtaining long-term funding, to be successful the system needed to drive the incremental cost of acquisition of each successive datapoint to near zero. Such a system required effective technology and web-based data management to submit, quality-control, identify, and curate encounter data for a growing set of known individual whales. The NPPID was built on newly established automated fluke photo-ID matching technology. This technology achieves a measured 97–99% accuracy with good- to high-quality images and is orders of magnitude faster than manual visual matching42. However, a system is not technology alone; the system needed to sustainably nurture positive collaboration practices to bring together the many contributors responsible for tens of thousands of whale encounters per year. Therefore, the NPPID was developed as a shared effort utilizing the user-friendly and interactive web-based platform, www.Happywhale.com (Happywhale). Here we describe the process of building this ocean-basin-wide ongoing photo-ID collaboration involving 43 research groups and thousands of public contributors (also known as "community scientists" or "citizen scientists"). This approach has enabled rapid feedback for population and longitudinal studies of humpback whales across the North Pacific. The process and framework described here have broader practical relevance for navigating the use of complex multi-contributor datasets.
Materials and methods
The North Pacific humpback whale Photo-ID (NPPID) collaboration
This effort began in 2018 as a data-sharing initiative to revive the collaboration established with the 2004–2006 SPLASH study8, supplemented by photo-ID images from community scientists. We built upon the SPLASH dataset, study methodology, and collaboration, but did not have a budget for data acquisition or fieldwork. All SPLASH collaborators known to be active in North Pacific humpback whale studies were invited to join, along with all known newer regional researchers and organizations. Data collection relied on existing archives and ongoing field efforts by the individual collaborators. All dedicated data collection by study collaborators was carried out in accordance with permitting requirements of respective authorities (permit details are listed in acknowledgements). Data collection from community scientists was sourced primarily from whale watch companies operating under regulations and guidelines of respective national, regional, and local authorities. A primary incentive for participation in the NPPID collaboration was the promise of novel and fully automated image-recognition technology42 that effectively eliminated the cumbersome, time-intensive visual matching process from photo-ID data management.
Through a memorandum of agreement (MOA, Supplementary Material I), all research organizations in the NPPID committed to sharing photo-ID images and associated supporting data for every available encounter, with a focus on a 2001–2021 study period. The specific research aim was to further knowledge of population structure and migratory movement via photographic mark-recapture population model development e.g.21,43. The broader aim was to create an ongoing, living dataset for continued population monitoring. Under the MOA terms, each data contributor chose whether their data were publicly visible via Happywhale or visible only to collaborators who had signed the MOA. The MOA limited data use to a defined set of publications about population status and migratory patterns; any additional use required agreement from all collaborators. The infrastructure, compiled data, and collaborator connections will remain after the period of the current MOA. Therefore, its use needs to be addressed with further agreement among collaborators if the dataset is going to be an ongoing, living entity.
Data integration and quality control
Humpback whale encounter data were delivered to Happywhale data managers from collaborators in a wide range of states of reconciliation, from unmatched original scans and digital photos to fully edited sets of images (i.e., exposure adjusted as needed and cropped tightly around flukes), with IDs assigned to each individual whale. The minimum data required for each encounter were: date, location, and photo-ID image or confirmed individual ID. All encounters of each whale were preserved, and all available supporting attribute data were maintained with the encounter; this could include filename, date, time, location, individual ID from the collaborator’s naming/numbering system, observer names, vessel name, observed whale sex, age class, health, behavior, group composition and any further observations. Because the state of every dataset varied at the time of delivery, all data were managed through the following standard series of steps:
-
1.
Image management and matching: Images were quality-controlled through cropping tightly around the flukes and, if necessary, exposure adjustment to facilitate algorithmic ID followed by visual ID confirmation. All images were quality-scored on a 0–5 scale as described in a previous study42, where 0 represented photos in which no photo-ID features were visible, and 1–5 represented very poor to excellent quality photos, respectively. All photo-ID images were matched to a progressively growing set of known whales via an automated image recognition system42. Every match proposed by the system was manually confirmed by a trained observer. All matches that could be visually confirmed by a trained observer were maintained regardless of image quality. A previous study established that 97–99% of potential matches are found by this method for good- to high-quality images42.
-
2.
Supporting attribute data curation: Given the diversity of supporting data formats received, standardization was necessary for dataset management. Locations were categorized as general (confident of location within 200 km [km]), approximate (confident of location within 20 km), or precise (confident of location within 2 km). Within the precise location category, location data source was categorized as: (a) camera GPS embedded into the image, (b) synchronous GPS track, (c) pinpoint recorded from a GPS unit, (d) pinpoint recorded via a mobile app, or (e) manually transcribed record. For encounters without a known date, an approximate date to month, season or year was assigned as information allowed, with date precision noted in encounter attributes. Encounters without a date known at least to year or location known confidently within 200 km were excluded. Descriptive observational data and contextual information such as whale sex, age class, behavior, mother/calf relationships or group composition, and scarring (e.g., from entanglement, ship strike, killer whales) were recorded with each encounter when available and without standardization. Data quality was reviewed on import, with an opportunity for review by both data managers and data contributors before entry into a relational database.
-
3.
Efficiency with large datasets: To increase efficiency for collaborators with large, well-curated datasets, some encounters were accepted with an individual ID name/number and supporting date, location, and attribute data, without a photo-ID image linked to every encounter. These encounters were linked to known individuals represented in one or more catalog photos.
Many-to-one reference catalogs
All images were automatically matched against all individual humpback whales known at the time of each respective dataset integration. Across the NPPID study area, 39 separate catalog systems were received that had collaborator-specific individual IDs (Table 1). These ID naming systems were accommodated into a many-to-one ID structure so that any individual could be tracked via any of the multiple catalog IDs assigned to them.
Reconciliation of duplicate IDs
Every image was matched within and among all collaborator catalogs. One individual ID per catalog was allowed. Thus, if individuals were found with duplicate IDs due to false negatives (where a previously undetected match of one whale with two or more separate IDs within a collaborator catalog was found), the contributor chose the persisting ID (typically the lowest of a sequential ID series). Each duplicate ID was noted in the attributes for the individual whale. Newly detected (i.e., unmatched) individuals were added to the continually growing reference set, with the collaborator ID, if available, or with a newly assigned Happywhale catalog ID. False positives (where two different whales were combined into one individual record) were minimized through trained observer review of every match.
Community science data contributions
Opportunistic images submitted through Happywhale were also matched against all known whales, supplementing the research collaboration with community science-sourced encounter data. The same image and data quality control standards were applied as described above. All community science data contributors implicitly acknowledged their choice of data usage rights during the submission process and had the option of changing usage rights settings among established levels of Creative Commons usage rights (https://en.wikipedia.org/wiki/Creative_Commons_license). Unlike research collaborators participating under the terms of the MOA, public contributors did not have the option of restricting public visibility. Public contributors had access to an encounter comment system whereby suspected data errors and outliers could be brought to the attention of data managers, creating a feedback loop for review and error detection.
Information system structure and development
The NPPID data management system integrated a workflow of image processing, individual identification, and recording and curating encounter and individual attribute information. Data were structured through units of contributors (i.e., “users”), images, encounters, individual humpback whales, and surveys (i.e., “voyages”), linked by a series of workflow processes (Fig. 1). The cloud-based information architecture was composed of a dedicated server for the Java Spring application using a PostgreSQL database populated with Darwin Core compatible fields44. Submitted binary media were stored in a Simple Storage Service (S3) system for global retrieval. The ID system used a combination of a Node server and a Python Flask app to run the PyTorch-based ID algorithm.
During the collaboration, ongoing system development brought enhanced functionality and sophistication to data management processes within the Happywhale.com web platform. In 2021, the automated image recognition system was rebuilt to deliver results in under 0.1 s per image. This efficiency reduced server load, which has accommodated direct access by collaborators to batch process photo-ID images directly via web and mobile app interfaces in the lab or field. Near-instantaneous access to image processing was adopted by many collaborators to facilitate more efficient and effective internal data management.
NPPID collaborators were invited to directly manage their data import process and ongoing curation, with training, feedback, and quality control oversight by system managers. Some collaborators used the system as a principal repository of their data while others maintained their own separate data management systems during the study. As import and management tools developed in a constantly evolving system, collaborators were increasingly enabled and encouraged to manage their own data.
System use, public outreach, and data accessibility
The FAIR Principles (Findable, Accessible, Interoperable, and Reusable) for scientific data45 guided system design. Public awareness of the opportunity to contribute to whale conservation science was spread through word of mouth, social media, and documentary films. The primary focus of outreach was to seek and reach naturalists, whale watch guides and enthusiasts already familiar with the concept of marine mammal photo-ID, and equipped with camera gear sufficient to create quality images. Community scientists and NPPID collaborators were promised they would be rewarded with knowledge. This was accomplished through a notification system with alerts to novel developments regarding individuals they had encountered (e.g., initial identification typically within a few days of submission, discovery of duplicate IDs, and ongoing resightings). Would-be contributors were directed to Happywhale with little guidance beyond a request for humpback whale photo-ID photos from any date and location, as long as the contributor could confirm the date and location. The data upload process sought to balance ease of access with rigor for data quality, with data validation dependent upon the image management process.
Data are searchable and accessible in ‘map view’ (Fig. 2) and ‘list view’ formats via Happywhale. Users can expand a search from a set of encounters (for example, all encounters contributed by one user or all encounters in a geographic area in a defined time period) to all sightings globally of individuals within the found set. This allows quick visual exploration of migratory connections for any set of whales. For collaborators, data are available for export into a standard comma-separated value (CSV) format, translatable to downstream analytical and research processes in GIS or statistical software.
Analytics—Documenting detection probability
The 2004–2006 SPLASH project actively developed collaborations and supported field efforts in all known (at the time) North Pacific humpback whale breeding and feeding areas, in pursuit of comprehensively representative sample sizes8. In contrast, the NPPID project relied on contributions from existing datasets, ongoing field efforts, and community science image contributions. With successive integration of datasets, detection probabilities progressively increased, leading to a predominance of resightings (documenting an individual multiple times) and fewer new whales added to the comprehensive catalog. This caused a shift in methodology from predominantly cataloging new whales to confirming matches of known whales. To understand the proportion of the populations sampled in our growing known dataset, we plotted a discovery curve of new versus total identified individuals (Fig. 3), and a modified discovery curve of individuals identified over time (Fig. 4), in order to describe effort over the course of the history of the dataset.
Results
The NPPID collaboration involved 43 research organizations and included data from all nations around the North Pacific rim where humpback whales are known to regularly occur (Tables 1 and 2, Fig. 5). The complete NPPID collaboration ocean-basin dataset totaled 30,100 individual whales (February 1977 through August 2022 encompassing all available data). A total of 27,956 unique individuals were documented in 157,350 encounters during the 2001–2021 study period (Table 2, Fig. 2). Effort was variable over time: it was much higher in some areas relative to others, and skewed to the central and eastern North Pacific. However, data collection occurred in all known humpback whale breeding and feeding areas, with high rates of individual resighting throughout (Table 3, Fig. 5). Approximately two-thirds of encounters were represented by a single photo-ID image, while the remaining third contained additional supporting images (e.g., multiple views of the flukes, dorsal fin to fluke series and/or behavioral and anatomical images of the same individual). Naming/numbering protocols for 39 reference catalogs were combined into one unified set, with an average of 1.96 IDs per individual (range: 1–10). Most encounters (66%, documenting 24,049 individuals) were sourced from NPPID collaborators, with the remaining 34% submitted by community scientists (documenting 15,298 individuals); these are shown by region in Fig. 5. The community science-sourced component of the dataset was contributed by 3413 Happywhale users (Supplementary Material II). By volume, most community science-sourced images were contributed by whale watch tour naturalists, who consistently photographed and uploaded photo-ID images of every whale they were able to photograph. Most encounters (66%) were made publicly visible, with the remainder visible only to NPPID collaboration members (Tables 1 and 2 by region and research group). An additional 6318 humpback whale encounters (4% of total North Pacific encounters, primarily from public contributors), remained unidentified to individual due to poor image quality.
An annual average of 87% of individuals (84–92%) were documented in more than one season (Table 3, by region Fig. 5), averaging 5.6 seasons of encounters per individual. During the three-year SPLASH study, the cumulative number of individuals documented increased annually by an average of 21%. By contrast, from 2017 forward, with a comparable or greater number of individuals identified per year, cumulative individuals increased by an average of 5%, due to the documentation of a higher proportion of living individuals (Fig. 3). Data collection temporarily surged during the 2004–2006 SPLASH study, then increased gradually from 2007 and 2014 and more strongly from 2015 (Fig. 4).
Automated image recognition with manual review of each proposed match detected approximately 2,300 duplicate IDs (false negatives) within the 39 collaborator catalogs: these represent cases where the same whale was given multiple IDs within one catalog due to an undetected match (8% of total individuals). The range of false negatives across collaborator catalogs of greater than 100 individuals was 0.1–11%. In the SPLASH dataset of 7971 total individuals, 331 (4%) previously undetected false negatives were found. False positive errors, where two or more whales were confused as one individual, were far less likely than false negatives, prevented by manual review of each proposed match. False positives error rates were estimated to be below 0.1%. Over 5700 encounter comments were received through Happywhale's online comment fields from researchers and community scientists, in many cases alerting data managers to potential errors in date, location and/or whale identities.
Discussion
The NPPID collaboration established a comprehensive, broad-scale, and rich dataset made possible by a rapid and rewarding feedback process connecting collaborator and community science data around the North Pacific Ocean basin. The NPPID collaboration is the first of its kind to develop a long-term individual ID database on this scale. This effort established a unique dataset foundation well-suited for humpback whale population modeling, as well as for any research benefitting from individual identification, such as longitudinal studies of individual health.
This study began during the development of fast and accurate automated image recognition for humpback whale flukes and demonstrated the scalability for the algorithm used. We could not initially predict how comprehensively we might document the populations of humpback whales across the NPPID study area. However, in a relatively short period the results exceeded expectations. As of August 2022, 56 months after the creation of this study, 30,100 individual North Pacific humpback whales had been documented. Some regions are now extremely well sampled. For example, in Southeast Alaska and northern British Columbia for 2011–2019, fewer than 6% of individuals encountered each year were unique (encountered in only one season) (Table 3, Fig. 5). The annual set of newly documented individuals includes recruitment of calves and juveniles, and a progressively smaller proportion of previously undocumented adults.
Data gaps exist, particularly in the western North Pacific, in remote feeding areas such as the Aleutian Islands, and in the Mexican offshore breeding area of the Revillagigedo Islands, where effort was far less than in most breeding, feeding and migratory corridor areas of the central and eastern Pacific. In the Northwestern Hawaiian Islands archipelago, recent acoustic-based surveys including those using wave-glider technology have revealed substantial singing and thus humpback whale abundance with relatively little fluke ID effort46,47,48. It remains to be determined if the majority of these whales use this region as a terminal breeding ground, or whether they mix during a breeding season with those whales in the main Hawaiian Islands. However, even in these least-sampled regions, over 50% of individuals were encountered in more than one season, in the same or in different regions. Thus, we believe that the great majority of individuals in all the North Pacific, including the less sampled regions, are documented in the NPPID dataset. By extensively resampling populations in breeding grounds, migratory corridors, and feeding areas, the impact of effort bias on population models can be reduced21. We believe applies to the NPPID dataset.
Accessibility and user agreements
Data collection should not be an end unto itself, and sharing is a core tenet of good data management49. The Happywhale web platform was developed to make data accessible by design, aiming for a user experience that is both easy and rewarding. Users were motivated to contribute more and higher-quality data by a simple user interface to upload images, which then rewarded them with rapid results of information about “their” individual whales. Accessibility creates a public good as a resource for research, education, resource management, and science communication. In the existing NPPID dataset, 66% of all North Pacific humpback whale encounter data are publicly visible. Researchers and community scientists can explore migratory connections across the North Pacific via the web platform (Fig. 2). For research collaborators, this has inspired studies that would not have been possible without the large collective investment in building a platform and populating it with a comprehensive and contemporary dataset50,51. As of December 2022, the NPPID had contributed data to seven other collaborative peer-reviewed publications13,37,38,52,53,54,55. Accessible information about North Pacific humpback whale individuals has also proven very useful for resource managers, for example in tracking fishing gear entanglement cases, and individual identification and past sighting histories of dead or stranded whales56.
We recognize that including many actors and an open-science stance can add complexity to a collaboration57 with concerns for misuse of shared or public data58. Successful aspects of this collaboration bring opportunities but also pose two challenges that the collaboration must address: (1) How do we encourage contributing researchers to allow public visibility of data to allow the widest possible benefit, while ensuring data are used correctly in context, with proper credit preserved? (2) How do we simplify and clarify co-authorship policies to be effective, meaningful, and not so complex as to hinder publication?
An ideal collaboration builds datasets that directly answer present biological and management questions, and simultaneously creates data-sharing readiness. Data readiness for study of ecological change depends on both standardized repositories and aligned research interests13,59,60. The NPPID dataset has been successfully applied in this context, contributing to challenging management issues such as the US West Coast Dungeness crab fishery. Here, researchers can readily determine the proportion of whales in the Endangered Central American DPS51,61,62,63. The NPPID collaboration began with a MOA, offering co-authorship to contributors in a series of publications investigating humpback whale migratory patterns and population status in the North Pacific. Collaborators wishing to address additional research questions must seek permission from all relevant data contributors. While the communication required is a cost imposed on prospective studies, community is built around mutually beneficial collaboration. The MOA created an effective working group and context for this study through the completion of the specified series of publications. Future success will require clear use, sharing, and management policy, with oversight and funding maintained into the future.
Data quality improved by accessibility
Accessibility adds value as part of the FAIR Principles for scientific data45 that guided this study design. Accessibility also serves the immediate practical value of improving data quality, consistency, and repeatability. Active collaboration and public access to data make knowledge gaps more visible and encourages effort to fill them64. With many eyes reviewing the dataset, from curious public enthusiasts exploring encounters of “their” whales or an area of their personal interest to research collaborators pursuing diverse lines of inquiry, an ongoing collaborative quality control process frequently detects data discrepancies. Happywhale user comments—over 5,700 as of August 2022—alerted NPPID data managers to enough errors that public accessibility to review might be considered as a systematic method of quality control, worthy of attention for its own value and efficiency.
All datasets will contain errors; more accurate image recognition, repeatedly applied, and review of data by diverse users will continually detect some, but not all errors. The SPLASH study estimated a 9–10% rate of missed matches using trained human matchers, the largest model error correction factor in the associated mark-recapture population estimate21. This kind of accuracy assessment rarely appears in photo-ID based mark-recapture studies, yet missed matches were detected in every dataset larger than 100 individuals involved in this study. Our finding of 331 false negatives in 7971 (4%) total individuals in the SPLASH study, when added to algorithm error rates for good-to-high quality images of 1–3%42, suggests the 9–10% error estimation was high by 3–4%. In our most accurately matched large dataset, the 2004–2020 whales of Glacier Bay National Park and Preserve, Alaska, missed matches accounted for only 0.15% (1 of 633 individuals, a first-summer calf to adult match with substantial fluke pigment change). All other datasets of more than 100 individuals showed from 2 to 11% detectable false negative missed match rates. Considering this range and other sources of error and bias, it is important to understand and account for limitations in any dataset, including ours.
Effort bias and appropriate use
Ideally, a dataset should be created with its specific use in mind a priori, following a good data management plan49 with an optimized data workflow65. However, because we built a dataset gathered from post-SPLASH photo-ID archives and opportunistic efforts, standardization had to stand in for a priori data management plans. The effort was geographically and temporally heterogeneous, and any study design or interpretation of data must account for this to ensure appropriate use. It would be easy, for example, to falsely interpret the lower effort in the western North Pacific as evidence of smaller whale populations. Datasets cannot be assumed to provide an error-free documentation of humpback whale presence in the study area (i.e., devoid of effort bias); no clear rule can be set a priori to identify the appropriate application of an evolving dataset of this nature. It is therefore imperative that any potential data user actively engage directly with collaborating researchers to understand data limitations and potential. Data contributors can also be the primary data users, a group that will benefit from increased knowledge of and aptitude with the data management system built through Happywhale.
Because there could not be a comprehensive data collection plan across this large scale of a study area and time period, the full dataset might be considered opportunistic, a sum of collected efforts of dedicated research, research from platforms of opportunity, and community science contributions. Figure 4 demonstrates a large increase in data collection over time, elevated during the 2004–2006 SPLASH study, then building to similar levels from 2017 forward. Data collection rates have benefitted from many factors. These include: improvement in digital cameras, the growing popularity of whale watching, the 2015 establishment of the Happywhale platform, increased effort by many NPPID collaborators to capture fluke photos within existing field efforts, and the 2020 establishment and NOAA Fisheries funding of the SPLASH-2 program. The latter helped fund data collection efforts in poorly sampled areas, and infrastructure to support submissions to Happywhale. Our peak sample year was 2019, with 6,384 (21%) of 30,100 known North Pacific humpback whales documented. The COVID-19 pandemic interrupted both field research efforts and tourism in 2020 and 2021 (Fig. 4), though we believe sampling will recover and continue to increase.
Building a successful collaboration
The NPPID study benefitted from the largely successful precedent of the SPLASH study both in providing a foundation of data (Fig. 4) and as a collaborative framework. The current study began at a time when new methods were needed to efficiently manage large volumes of post-SPLASH data, where separate research efforts were constrained by time-intensive visual matching of photo-ID datasets. Although the SPLASH study produced notable insight and remains frequently cited, and the catalog was made available online, the study was not intended to continue beyond 2006, and the online dataset was not built to facilitate photo-ID matching. The role of the NPPID collaboration agreement was to establish clear expectations and create an environment of openness, trust, transparency, and consistency. This context was necessary for research collaborators to feel comfortable sharing images and data that were products of many thousands of person-hours and costs in the field. Positive and useful feedback delivered by rapid results from image recognition efforts was also necessary. Researchers were enticed to join the collection in part by the instant gratification when most of their flukes immediately matched to known individuals; this was a welcome change from years of toil over visually matching isolated photo-ID datasets. Success was crafted by a combination of a high-quality product supported by solid guiding principles of Transparency, Responsibility, User focus, Sustainability and Technology (TRUST), to promote digital repository trustworthiness66. The idea behind these principles is that as a data repository, we must earn the trust of the community we serve and demonstrate that we are reliable and capable of appropriately managing the data we curate. Empowerment comes through this intentional framework, with a feeling of collective ownership rather than isolated possession. This then supports sustainable collaboration by creating active participation of research users.
As an ongoing, living dataset, the NPPID developed active, increasingly decentralized participation in ongoing data management with an intent to serve diverse needs in the research community. System development remains ongoing, with a focus on providing research collaborators with tools to become more directly involved with data management. This development reduces centralized data management costs, serves the real-time needs of collaborators, and benefits the dataset with local expertise, potentially detecting data issues that would not be recognized by remote data managers.
Conclusion: sustainability and maximizing future value
The NPPID effort has established a single unified repository. This has been accomplished by reconciling all available research catalogs and ID nomenclature, and by aggregating all individual identities and encounter data into a state of data readiness unprecedented on a long-term and ocean-basin-scale. The first benefits are cost savings and organizational effectiveness. Particularly in well-sampled areas, data processing is revolutionized by immediate access to a fast and reliable photo-ID system. Collaborators reported that this “saves countless hours of manual visual matching, allowing us to get our data out in products, papers, and outreach more quickly” (JN) and “reduces lab time by 90%” (AS). However, collaborators face the challenge of how to maximize the present and future value of the NPPID dataset. A primary outstanding need is to create clarity for how researchers efficiently access, establish permission, and create sub-collaborations to develop further studies beyond the term of the NPPID collaboration.
System functionality was developed in a constant feedback cycle to accommodate progressively larger datasets through the study. This dataset appears to document most living humpback whales across the North Pacific Ocean basin, creating an abundance of data and inspiring an ambition to monitor populations in near-real time. With heterogeneous sampling effort over the study area, critical data gaps can be identified for understanding abundance and population structure. In addition, minimum sample sizes for reliable, robust population models can be established. Given the low cost of data storage, and if the incremental cost of each additional data point is driven to near zero, there is very little cost to overshooting a threshold of “enough” data.
Having now acquired sufficient baseline data for North Pacific populations in the face of a changing ocean, we aim for data readiness to understand the implications of ecosystem events on a timescale that benefits resource management. This study concentrates on humpback whales of the North Pacific, but the concept and methods can be extended to many species. Baleen whales are recognized to influence marine ecosystems on a massive scale67. In recognition of the concept of essential biological variables68,69, there is a need for marine observation and data at an ocean-basin-wide scale70,71,72. This dataset, the collaboration agreement, and the system established to create and maintain it can contribute to our understanding of essential ocean variables.
This study established an extremely cost effective and utilitarian information architecture, delivering an essential service for ongoing studies. If investment in collaborator engagement, upkeep, development, and data management continue, the future of this collaborative system promises great contributions to the understanding of North Pacific humpback whale populations. Sustainability will require a transition from the centralized efforts of a multi-year study to an established project at a stable institution with community ownership, oversight, and funding. We see this effort not as collecting and possessing a dataset, but as curating a public good for the betterment of science, education, and marine conservation. The FAIR and TRUST principles are central to guiding development, recognizing that accessibility requires more than just a data search feature via a web browser. To truly achieve full potential will require decentralizing data management to research collaborators, a shift that requires further system development, funding, user training, and commitment. Involving scientists in data management has evolved through time from a widespread disconnect73 to a current trend of ecological “big data” where data management is a necessary skill for ecologists, as has already happened with statistics and GIS74. We believe that establishing this scale-shifting dataset, given continued investment, will continue to improve understanding, awareness, stewardship, and respect for the North Pacific marine ecosystem.
Data availability
The publicly viewable 66% of the full dataset used in this study, with ongoing additions and updates is available for exploration at www.Happywhale.com. All data are available with collaborator agreement to explore at Happywhale and in spreadsheet format. Please contact the corresponding author for discussion and permission. Approximately one-third of the dataset is public domain, but the collaborators believe that providing this partial dataset for open access download would be a disservice to the integrity of the full dataset.
References
Karczmarski, L., Chan, S. C. Y., Rubenstein, D. I., Chui, S. Y. S. & Cameron, E. Z. Individual identification and photographic techniques in mammalian ecological and behavioural research—Part 1: Methods and concepts. Mamm. Biol. 102, 545–549 (2022).
Clapham, P. J. & Mead, J. G. Megaptera novaeangliae. Mamm. Species 3, 1–9 (1999).
Mackintosh, N. A. The natural history of whalebone whales. Biol. Rev. 21, 60–74 (1946).
Calambokidis, J. et al. Movements and population structure of humpback whales in the North Pacific. Mar. Mammal Sci. 17, 769–794 (2001).
Chittleborough, R. G. Dynamics of two populations of the humpback whale, Megaptera novaeangliae (Borowski). Mar. Freshw. Res. 16, 33–128 (1965).
Dawbin, W. H. The Seasonal Migratory Cycle of Humpback Whales. in Whales, Dolphins, and Porpoises 145–170 https://doi.org/10.1525/9780520321373-011/HTML (2021).
Baker, C. et al. Migratory movement and population structure of humpback whales (Megaptera novaeanglieae) in the central and eastern North Pacific. Mar. Ecol. Prog. Ser. 31, 105–119 (1986).
Calambokidis, J. et al. in SPLASH: Structure of Populations, Levels of Abundance and Status of Humpback Whales in the North Pacific. Final Report Contract AB133F-03-RP-00078. (United States Department of Commerce West. Adm. Cent., 2008).
Johnson, C. M. et al. Protecting Blue Corridors - Challenges and solutions for migratory whales navigating national and international seas. https://doi.org/10.5281/ZENODO.6196131 (2022).
Di Lorenzo, E. & Mantua, N. Multi-year persistence of the 2014/15 North Pacific marine heatwave. Nat. Clim. Chang. 6, 1042–1047 (2016).
Hu, Z. Z., Kumar, A., Jha, B., Zhu, J. & Huang, B. Persistence and predictions of the remarkable warm anomaly in the Northeastern Pacific ocean during 2014–16. J. Clim. 30, 689–702 (2017).
Arimitsu, M. L. et al. Heatwave-induced synchrony within forage fish portfolio disrupts energy flow to top pelagic predators. Glob. Chang. Biol. 27, 1859–1878 (2021).
Gabriele, C. M. et al. Sharp decline in humpback whale (Megaptera novaeangliae) survival and reproductive success in southeastern Alaska during and after the 2014–2016 Northeast Pacific marine heatwave. Mamm. Biol. https://doi.org/10.1007/s42991-021-00187-2 (2022).
Cartwright, R. et al. Fluctuating reproductive rates in Hawaii’s humpback whales, Megaptera novaeangliae, reflect recent climate anomalies in the North Pacific. R. Soc. Open Sci. 6, 181463 (2019).
Frankel, A. S., Gabriele, C. M., Yin, S. & Rickards, S. H. Humpback whale abundance in Hawai‘i: Temporal trends and response to climatic drivers. Mar. Mammal Sci. 38(1), 118–138. https://doi.org/10.1111/mms.12856 (2021).
Kügler, A., Lammers, M., Zang, E., Kaplan, M. & Mooney, T. Fluctuations in Hawaii’s humpback whale Megaptera novaeangliae population inferred from male song chorusing off Maui. Endanger. Species Res. 43, 421–434 (2020).
von Biela, V. R. et al. Premature mortality observations among Alaska’s pacific salmon during record heat and drought in 2019. Fisheries 47, 157–168 (2022).
Rocha, R. C., Clapham, P. J. & Ivashchenko, Y. V. Emptying the oceans: A summary of industrial whaling catches in the twentith century. Mar. Fish. Rev. 76, 37–48 (2014).
Ivashchenko, Y. V. & Clapham, P. J. Too much is never enough: The cautionary tale of soviet illegal whaling. Mar. Fish. Rev. 76, 1–21 (2014).
Johnson, J. H. & Wolman, A. A. The Humpback Whale, Megaptera novaeangliae. Mar. Fish. Rev. 46, 30–37 (1984).
Barlow, J. et al. Humpback whale abundance in the North Pacific estimated by photographic capture-recapture with bias correction from simulation studies. Mar. Mammal Sci. 27, 793–818 (2011).
Katona, S. K. & Whitehead, H. P. Identifying Humpback Whales using their natural markings. Polar Rec. (Gr. Brit.) 20, 439–444 (1981).
Jurasz, C. & Jurasz, V. in Results of 1977 Studies on Humpback Whales in Glacier Bay National Monument. Final Rept. (1978).
Bettridge, S. et al. in Status Review of the Humpback Whale (Megaptera novaeangliae) Under the Endangered Species act. (NOAA-TM-NMFS-SWFSC-540, 2015).
Herman, L. M. et al. Resightings of humpback whales in Hawaiian waters over spans of 10–32 years: Site fidelity, sex ratios, calving rates, female demographics, and the dynamics of social and behavioral roles of individuals. Mar. Mammal Sci. 27, 736–768 (2011).
Smith, T. D. et al. An ocean-basin-wide mark-recapture study of the north Atlantic humpback whale (Megaptera novaeangliae). Mar. Mammal Sci. 15, 1–32 (1999).
Federal Register. Endangered and threatened species. in Identification of 14 Distinct Population Segments of the Humpback Whale (Megaptera novaeangliae) and Revision of Species-Wide Listing 62260–62319. (2016). Available at: https://www.federalregister.gov/documents/2016/09/08/2016-21276/endangered-and-threatened-species-identification-of-14-distinct-population-segments-of-the-humpback. (Accessed: 21st Mar 2023)
Mobley, J. R. Jr., Deakos, M. H., Pack, A. A., Bortolotto, G. A. & Joseph Mobley, C. R. Aerial survey perspectives on humpback whale resiliency in Maui Nui, Hawaiʻi, in the face of an unprecedented North Pacific marine warming event. Mar. Mammal Sci. https://doi.org/10.1111/MMS.13018 (2023).
Cates, K. A. et al. Corticosterone in central North Pacific male humpback whales (Megaptera novaeangliae): Pairing sighting histories with endocrine markers to assess stress. Gen. Comp. Endocrinol. 296, 113540 (2020).
Cates, K. A. et al. Testosterone trends within and across seasons in male humpback whales (Megaptera novaeangliae) from Hawaii and Alaska. Gen. Comp. Endocrinol. 279, 164–173 (2019).
Pack, A. A. et al. Comparing depth and seabed terrain preferences of individually identified female humpback whales (Megaptera novaeangliae), with and without calf, off Maui. Hawaii. Mar. Mammal Sci. 34, 1097–1110 (2018).
Pack, A. A. et al. Habitat preferences by individual humpback whale mothers in the Hawaiian breeding grounds vary with the age and size of their calves. Anim. Behav. 133, 131–144 (2017).
Pack, A. A. et al. Size-assortative pairing and discrimination of potential mates by humpback whales in the Hawaiian breeding grounds. Anim. Behav. 84, 983–993 (2012).
Herman, L. M. et al. Humpback whale song: Who sings?. Behav. Ecol. Sociobiol. 67, 1653–1663 (2013).
Hill, M. et al. Found: a missing breeding ground for endangered western North Pacific humpback whales in the Mariana Archipelago. Endanger. Species Res. 41, 91–103 (2020).
Espinoza Rodríguez, I. J., Frisch Jordán, A. & Noriega Betancourt, F. Humpback whales in Banderas Bay, Mexico: Relative abundance and temporal patterns between 2004 and 2017. Lat. Am. J. Aquat. Mamm. 16, 33–39 (2021).
Martien, K. K. et al. NOAA technical memorandum NMFS-SWFSC-658 evaluation of Mexico distinct population segment of humpback whales as units under the marine mammal protection act. https://doi.org/10.25923/nvw1-mz45 (2021).
Taylor, B. L. et al. Evaluation of humpback whales wintering in Central America and southern Mexico as a demographically independent population. US Dep. Commer. Natl. Ocean. Atmos. Adm. Natl. Mar. Fish. Serv. Southwest Fish. Sci. Cent. 3, 103–111 (2021).
Wade, P. R. et al. Estimates of abundance and migratory destination for North Pacific humpback whales in both summer feeding areas and winter mating and calving areas. in Paper SC/66b/IA21 submitted to the Scientific Committee of the International Whaling Commission, June 2016, Bled, Slovenia. Available at https://archive.iwc.int/. (2016).
Hendrix, A. N., Straley, J., Gabriele, C. M. & Gende, S. M. Bayesian estimation of humpback whale (Megaptera novaeangliae) population abundance and movement patterns in southeastern Alaska. Can. J. Fish. Aquat. Sci. 69, 1783–1797 (2012).
Gabriele, C. M. et al. Natural history, population dynamics, and habitat use of humpback whales over 30 years on an Alaska feeding ground. Ecosphere 8, e01641 (2017).
Cheeseman, T. et al. Advanced image recognition: A fully automated, high-accuracy photo-identification matching system for humpback whales. Mamm. Biol. 2021, 1–15. https://doi.org/10.1007/S42991-021-00180-9 (2021).
Stevick, P. et al. North Atlantic humpback whale abundance and rate of increase four decades after protection from whaling. Mar. Ecol. Prog. Ser. 258, 263–273 (2003).
Wieczorek, J. et al. Darwin core: An evolving community-developed biodiversity data standard. PLoS One 7, e29715 (2012).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 1–9 (2016).
Lammers, M. O. et al. The occurrence of humpback whales across the Hawaiian archipelago revealed by fixed and mobile acoustic monitoring. Front. Mar. Sci. 10, 49 (2023).
Lammers, M. O. et al. Humpback whale Megaptera novaeangliae song reveals wintering activity in the Northwestern Hawaiian Islands. Mar. Ecol. Prog. Ser. 423, 261–268 (2011).
Johnston, D. W., Chapla, M. E., Williams, L. E. & Mattila, D. K. Identification of humpback whale Megaptera novaeangliae wintering habitat in the Northwestern Hawaiian Islands using spatial habitat modeling. Endanger. Species Res. 3, 249–257 (2007).
Ten Michener, W. K. Simple rules for creating a good data management plan. PLoS Comput. Biol. 11, e1004525 (2015).
Darling, J. D. et al. Humpback whales (Megaptera novaeangliae) attend both Mexico and Hawaii breeding grounds in the same winter: Mixing in the northeast Pacific. Letters https://doi.org/10.1098/rsbl.2021.0547 (2022).
Tackaberry, J., Dobson, E., Flynn, K., Cheeseman, T. & Calambokidis, J. Low resighting rate of entangled humpback whales within the California, Oregon, and Washington region based on photo-identification and long-term life history data. Front. Mar. Sci. 8, 2092 (2022).
Curtis, K. A. et al. NOAA technical memorandum NMFS Abundance of humpback whales (Megaptera Novaeangliae) wintering in Central America and Southern Mexico from a one-dimensional spatial capture-recapture model https://doi.org/10.25923/9cq1-rx80 (2022).
Patton, D. & Lawless, S. Surface and underwater observation of a humpback whale (Megaptera novaeangliae) birth in progress off Lahaina, Maui, and subsequent encounter of the female with a healthy calf. Aquat. Mammals https://doi.org/10.1578/AM.47.6.2021.550 (2021).
Henderson, E. E., Deakos, M. & Engelhaupt, D. Dive and movement behavior of a humpback whale competitive group and a multiday association between a primary escort and female in Hawaiʻi. Mar. Mammal Sci. https://doi.org/10.1111/MMS.12891 (2021).
Lowe, C. L. et al. Patterns of cortisol and corticosterone concentrations in humpback whale (Megaptera novaeangliae) baleen are associated with different causes of death. Conserv. Physiol. https://doi.org/10.1093/conphys/coab096 (2021).
Lowe, C. L. et al. Case studies on longitudinal mercury content in humpback whale (Megaptera novaeangliae) baleen. Heliyon 8, e08681 (2022).
Gewin, V. Data sharing: An open mind on open data. Nat. 529, 117–119 (2016).
Mills, J. A. et al. Archiving primary data: Solutions for long-term studies. Trends Ecol. Evol. 30, 581–589 (2015).
Urbano, F., Cagnacci, F. & Initiative, E. C. Data management and sharing for collaborative science: Lessons learnt from the euromammals initiative. Front. Ecol. Evol. 9, 727023 (2021).
Suryan, R. M. et al. Ecosystem response persists after a prolonged marine heatwave. Sci. Rep. 11, 1–17 (2021).
Samhouri, J. F. et al. Defining ecosystem thresholds for human activities and environmental pressures in the California current. Ecosphere 8, e01860 (2017).
Samhouri, J. F. et al. Marine heatwave challenges solutions to human–wildlife conflict. Proc. R. Soc. B 288, 1964 (2021).
Santora, J. A. et al. Habitat compression and ecosystem shifts as potential links between marine heatwave and record whale entanglements. Nat. Commun. 11, 1–12 (2020).
Costello, M. J., Horton, T. & Kroh, A. Sustainable biodiversity databasing: International, collaborative, dynamic. Centralised. Trends Ecol. Evol. 33, 803–805 (2018).
Hackett, R. A. et al. A data management workflow of biodiversity data from the field to data users. Appl. Plant Sci. 7, e11310 (2019).
Lin, D. et al. The TRUST Principles for digital repositories. Sci. Data 7, 1–5 (2020).
Savoca, M. S. et al. Baleen whale prey consumption based on high-resolution foraging measurements. Nature 599, 85–90 (2021).
Jetz, W. et al. Essential biodiversity variables for mapping and monitoring species populations. Nat. Ecol. Evol. 3, 539–551 (2019).
Kissling, W. D. et al. Building essential biodiversity variables (EBVs) of species distribution and abundance at a global scale. Biol. Rev. 93, 600–625 (2018).
Bax, N. J. et al. A response to scientific and societal needs for marine biological observations. Front. Mar. Sci. 6, 395 (2019).
Muller-Karger, F. E. et al. Advancing marine biological observations and data requirements of the complementary Essential Ocean Variables (EOVs) and Essential Biodiversity Variables (EBVs) frameworks. Front. Mar. Sci. 5, 211 (2018).
Miloslavich, P. et al. Essential ocean variables for global sustained observations of biodiversity and ecosystem changes. Glob. Chang. Biol. 24, 2416–2433 (2018).
Lynch, C. How do your data grow?. Nature 455, 28–29 (2008).
Urbano, F. et al. Wildlife tracking data management: a new vision. Philos. Trans. R. Soc. B Biol. Sci. 365, 2177–2185 (2010).
Titova, O. V. et al. Photo-identification matches of humpback whales (Megaptera novaeangliae) from feeding areas in Russian Far East seas and breeding grounds in the North Pacific. Mar. Mammal Sci. 34, 100–112 (2018).
Calambokidis, J. & Barlow, J. Abundance of blue and humpback whales in the Eastern North Pacific estimated by capture-recapture and line-transect methods. Mar. Mammal Sci. 20, 63–85 (2004).
Calambokidis, J. et al. Interchange and isolation of humpback whales off California and other North Pacific feeding grounds. Mar. Mammal Sci. 12, 215–226 (1996).
Calambokidis, J. & Barlow, J. Trends in the abundance of Humpback Whales in the North Pacific Ocean from 1980 to 2006. in WC Report SC/A17/NP/10 for the Workshop on the Comprehensive Assessment of North Pacific Humpback Whales. 18–21 April 2017. 16 (2017).
Calambokidis, J. et al. Migratory destinations of humpback whales that feed off California, Oregon and Washington. Mar. Ecol. Prog. Ser. 192, 295–304 (2000).
Palacios, D. M. et al. Humpback Whale Tagging in Support of Marine Mammal Monitoring Across Multiple Navy Training Areas in the Pacific Ocean: Final Report for the Pacific Northwest Feeding Area in Summer/Fall 2019, Including Historical Data from Previous Tagging Efforts off the US West Coast. Prepared for Commander, U.S. Pacific Fleet. Submitted to Naval Facilities Engineering Command Southwest, under Cooperative Ecosystem Studies Unit, Department of the Navy Cooperative Agreement No. N62473-19-2-0002. Oregon State University, Newport, Oregon. pp. 153 (2020).
Burdin, A. M., Titova, O. & Hoyt, E. in Humpback Whales of Russian Far East Seas. Photo-ID Catalog 2004–2014. (Publisher: Russian Geographical Society, 2014). Available at: https://www.researchgate.net/publication/272493253_Humpback_Whales_of_Russian_Far_East_Seas_Photo-ID_Catalog_2004-2014. (Accessed: 22nd Aug 2022)
Acknowledgements
The authors thank María González for creating manuscript figures, and Kaitlin Palmer for early review. The Happywhale dataset would not exist without the dedication, support, and trust of a very large community; please see Supplementary Information II for full acknowledgements.
Author information
Authors and Affiliations
Contributions
T.C., J.C., P.C. and K.S. conceived of the study, T.C., J.C., K.A., J.E.M., J.S. and S.T. secured funding, T.C. and K.S. developed the Happywhale system and web platform, T.C., J.C., K.A., J.S., S.T., P.C., J.M.A., L.B., C.B., A.L.B., J.K.B., R.C., J.J.C., A.J.G.C., J.C., J.D.W., N.D., T.D.V., K.D., O.F., R.F., K.F., J.F., A.F.J., C.G., B.G., C.H., J.H., M.C.H., J.J., M.J., N.K., E.L., M.M., E.M., P.M.L., C.M., C.M.M., J.R.M., J.N., H.R.N., H.N., H.O., M.O., A.P., D.P., H.P., E.Q.R., R.F.R.B., N.R., F.S., T.S,. S.S., I.S., A.S, O.T., J.U., M.V.A., O.V.Z., B.W., J.W., K.M.Y. and D.Z. contributed data, T.C., J.C., N.D., K.F., P.M.L., E.J.L., H.N., M.O., R.F.R.B., D.Z. and A.M. managed data, T.C., J.J.C., K.F. and D.P. crafted the manuscript including table and figure development, T.C., K.A., J.B., A.B., P.C., J.J.C., J.D.W., T.D.V., K.D., K.F., J.F., C.G., C.H., M.C., E.L., C.M.M., J.E.M., J.R.M., J.N., A.P., D.P., H.P., H.R.N., E.Q.R., N.R., J.S., O.T,. and M.V.A. edited the manuscript, and all authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cheeseman, T., Southerland, K., Acebes, J.M. et al. A collaborative and near-comprehensive North Pacific humpback whale photo-ID dataset. Sci Rep 13, 10237 (2023). https://doi.org/10.1038/s41598-023-36928-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-36928-1
- Springer Nature Limited