Augmenting citizen science with computer vision for fish monitoring

Mar 26, 2026 | AI

**Spring Witness: A Vital Journey Under Threat**

As winter recedes, a familiar yet increasingly fragile spectacle unfolds along Massachusetts’ coast. Each spring, river herring embark on their ancestral journey, a vital migration from the ocean’s embrace to the freshwater havens of rivers and streams where they will spawn.

However, this timeless ritual is now shadowed by a stark reality. Over recent decades, river herring populations have experienced precipitous declines. To understand the extent of this crisis, a dedicated network of observers, relying heavily on visual counts and the invaluable efforts of citizen scientists, meticulously tracks these migrations across the region. Their watchful eyes are crucial in documenting the dwindling numbers of these iconic fish as they navigate their challenging path.

As the annual herring run kicks off this month, scientists and resource managers are preparing to tackle the vital task of accurately counting and estimating the migrating fish population. This crucial undertaking is paramount for informing conservation strategies and effectively managing fisheries.

Here are a few options for paraphrasing the provided text, each with a slightly different emphasis, while maintaining a journalistic tone:

**Option 1 (Focus on the innovation and collaboration):**

> A collaborative effort involving researchers from the Woodwell Climate Research Center, MIT Sea Grant, MIT’s Computer Science and Artificial Intelligence Lab (CSAIL), MIT Lincoln Laboratory, and Intuit has unveiled a novel approach to environmental monitoring. This innovative method leverages underwater video footage analyzed by computer vision technology, aiming to enhance existing citizen science initiatives. The findings of this cross-institutional team, including Zhongqi Chen and Linda Deegan (Woodwell Climate Research Center), Robert Vincent and Kevin Bennett (MIT Sea Grant), Sara Beery and Timm Haucke (MIT CSAIL), Austin Powell (Intuit), and Lydia Zuehsow (MIT Lincoln Laboratory), were detailed in a paper published this February in the journal *Remote Sensing in Ecology and Conservation*.

**Option 2 (More concise, highlighting the “what” and “why”):**

> To bolster citizen science efforts, a multi-institutional research team has developed a new monitoring technique employing underwater video and sophisticated computer vision. Scientists from the Woodwell Climate Research Center, MIT Sea Grant, MIT CSAIL, MIT Lincoln Laboratory, and Intuit collaborated on this project, which aims to provide a more robust and automated way to collect environmental data. Their work, featuring contributions from researchers such as Zhongqi Chen, Linda Deegan, Robert Vincent, Kevin Bennett, Sara Beery, Timm Haucke, Austin Powell, and Lydia Zuehsow, was published in *Remote Sensing in Ecology and Conservation* in February.

**Option 3 (Emphasizing the technology’s potential):**

> A significant advancement in environmental monitoring, combining underwater video with cutting-edge computer vision, is set to complement citizen science programs. This pioneering research was conducted by a diverse team of experts from the Woodwell Climate Research Center, MIT Sea Grant, MIT’s Computer Science and Artificial Intelligence Lab (CSAIL), MIT Lincoln Laboratory, and Intuit. The publication of their findings in *Remote Sensing in Ecology and Conservation* this February marks a key step in deploying this technology. The research team included Zhongqi Chen and Linda Deegan (Woodwell Climate Research Center), Robert Vincent and Kevin Bennett (MIT Sea Grant), Sara Beery and Timm Haucke (MIT CSAIL), Austin Powell (Intuit), and Lydia Zuehsow (MIT Lincoln Laboratory).

**Key changes made in these paraphrases:**

* **Varied sentence structure:** Sentences are reordered and combined differently.
* **Synonym usage:** Words like “explored” become “unveiled,” “developed,” or “conducted.” “Supplement” becomes “enhance” or “complement.” “Method” becomes “approach” or “technique.”
* **Active voice where appropriate:** While the original is fine, some phrasing is adjusted for slightly more active impact.
* **Rephrasing for flow:** The information is presented in a way that reads more like a news report.
* **Emphasis shift:** Each option slightly changes the focus to highlight different aspects of the announcement (collaboration, technology, or purpose).

Here are a few options for paraphrasing the text, each with a slightly different emphasis:

**Option 1 (Focus on the innovation):**

> New research, published in an open-access paper titled “From snapshots to continuous estimates: Augmenting citizen science with computer vision for fish monitoring,” details how cutting-edge computer vision and deep learning technologies are poised to revolutionize fish monitoring. These advancements, encompassing sophisticated techniques like object detection, tracking, and species identification, present practical applications for automatically counting fish, promising to boost both the speed and accuracy of data collection.

**Option 2 (Focus on the benefits for citizen science):**

> Citizen science initiatives focused on fish monitoring are set to receive a significant upgrade, according to the open-access paper, “From snapshots to continuous estimates: Augmenting citizen science with computer vision for fish monitoring.” The study highlights how breakthroughs in computer vision and deep learning—including methods for detecting, tracking, and classifying fish—offer tangible solutions for automating fish counts. This integration is expected to lead to more efficient processes and higher quality data.

**Option 3 (More concise and direct):**

> A recent open-access study, “From snapshots to continuous estimates: Augmenting citizen science with computer vision for fish monitoring,” explores the transformative potential of modern computer vision and deep learning for fish monitoring. By leveraging technologies such as object detection, tracking, and species classification, the research demonstrates how automated fish counting can be achieved with greater efficiency and enhanced data reliability.

**Option 4 (Emphasizing the “continuous estimates” aspect):**

> Moving beyond traditional “snapshot” data, a new open-access paper, “From snapshots to continuous estimates: Augmenting citizen science with computer vision for fish monitoring,” reveals how sophisticated computer vision and deep learning are enabling continuous, automated fish monitoring. Techniques like object detection, tracking, and species classification are showing remarkable promise in real-world applications, offering a pathway to more efficient and accurate fish population assessments.

These paraphrased versions aim to:

* **Be Unique:** They use different sentence structures and vocabulary.
* **Be Engaging:** They highlight the “revolution,” “upgrade,” and “transformative potential.”
* **Be Original:** They avoid simply rearranging the original words.
* **Maintain Core Meaning:** They all convey that computer vision and deep learning can automate fish counting for better efficiency and data quality, building on citizen science efforts.
* **Use a Clear, Journalistic Tone:** They are factual, informative, and avoid overly technical jargon where possible.

**Traditional fish monitoring techniques struggle to keep pace with the dynamic nature of aquatic ecosystems.** Relying on human observation and limited sampling periods, these methods often miss crucial data, particularly nocturnal movements and fleeting migration events where vast numbers of fish can pass in mere minutes. While advancements in acoustic and sonar technologies offer continuous monitoring capabilities, their application can be conditional. The most accessible and cost-effective approach, reviewing underwater video footage, remains a significant bottleneck due to its labor-intensive and time-consuming nature. Addressing this growing need for automation, this research introduces a scalable, budget-friendly, and efficient deep learning system designed for accurate and automated fish monitoring.

Here are a few options for paraphrasing the provided text, each with a slightly different emphasis and tone, while maintaining a journalistic style:

**Option 1 (Focus on the technological advancement):**

> To automate fish counting using computer vision, the team developed a comprehensive, end-to-end system. This sophisticated pipeline integrates data collection from underwater cameras deployed in the field with subsequent video labeling and model training. The project captured footage from three distinct Massachusetts river systems: the Coonamessett River in Falmouth, the Ipswich River in Ipswich, and the Santuit River in Mashpee.

**Option 2 (More concise and action-oriented):**

> Researchers have successfully implemented an automated fish counting system powered by computer vision. Their end-to-end pipeline, encompassing everything from in-field underwater camera deployment to video analysis and model development, was tested using data from three Massachusetts rivers: the Coonamessett River (Falmouth), the Ipswich River (Ipswich), and the Santuit River (Mashpee).

**Option 3 (Highlighting the geographic scope and purpose):**

> A novel, automated approach to counting fish has been developed, leveraging computer vision technology. The team engineered an integrated pipeline, extending from the capture of underwater video footage in natural environments to the intricate processes of labeling and training artificial intelligence models. This initiative gathered video data from three key Massachusetts waterways: the Coonamessett River in Falmouth, the Ipswich River in Ipswich, and the Santuit River in Mashpee.

**Option 4 (Slightly more descriptive):**

> Automating the complex task of fish counting has been achieved through a cutting-edge, end-to-end pipeline developed by the team. This system seamlessly connects in-field underwater cameras with the crucial stages of video labeling and model training. The research utilized video recordings gathered from three specific rivers within Massachusetts: the Coonamessett River located in Falmouth, the Ipswich River in Ipswich, and the Santuit River in Mashpee.

Each of these options offers a unique way to present the same information. They aim to be more engaging and original by using varied sentence structures, stronger verbs, and slightly different descriptive language, all while adhering to a professional journalistic tone.

Here are a few paraphrased options, each with a slightly different emphasis, while maintaining a journalistic tone:

**Option 1 (Focus on Robustness):**

> To build a robust computer vision model capable of real-world application, researchers meticulously curated a training dataset designed to mirror diverse environmental conditions. They incorporated video footage featuring a wide spectrum of lighting, water clarity, fish species, and densities, alongside varying times of day and seasons. This comprehensive approach ensures the model’s reliability when deployed in unpredictable aquatic settings. The team then employed an open-source web platform to manually annotate this data, meticulously drawing bounding boxes frame-by-frame to track fish movements. In total, 1,435 video clips were processed, resulting in 59,850 annotated frames.

**Option 2 (Focus on Data Collection and Annotation):**

> The creation of the training dataset involved a deliberate selection of video clips, encompassing a broad range of variables such as lighting conditions, water transparency, the types and numbers of fish present, time of day, and seasonal changes. This diversity was crucial for ensuring the computer vision model’s dependable performance in a variety of real-world situations. Utilizing an open-source web platform, the team undertook the detailed task of manually labeling each frame of the videos with bounding boxes to precisely track fish trajectories. Ultimately, this effort encompassed 1,435 video clips and involved the annotation of 59,850 individual frames.

**Option 3 (More Concise):**

> Researchers assembled a comprehensive training dataset by selecting video clips that captured a wide array of real-world variables, including differences in lighting, water clarity, fish species and population, time of day, and season. This strategy was implemented to guarantee the computer vision model’s reliable functionality across varied scenarios. The team then manually labeled these videos frame-by-frame using an open-source web platform, drawing bounding boxes to track fish movement. The project involved annotating a total of 1,435 video clips, covering 59,850 frames.

**Key changes made across the options:**

* **”To prepare the training dataset”**: Rephrased to “To build a robust computer vision model,” “The creation of the training dataset involved,” or “Researchers assembled a comprehensive training dataset.”
* **”selected video clips with variations in…”**: Rephrased to “meticulously curated a training dataset designed to mirror diverse environmental conditions,” “involving a deliberate selection of video clips, encompassing a broad range of variables,” or “by selecting video clips that captured a wide array of real-world variables.”
* **”to ensure that the computer vision model would work reliably across diverse real-world scenarios”**: Rephrased to “This comprehensive approach ensures the model’s reliability when deployed in unpredictable aquatic settings,” “This diversity was crucial for ensuring the computer vision model’s dependable performance in a variety of real-world situations,” or “This strategy was implemented to guarantee the computer vision model’s reliable functionality across varied scenarios.”
* **”They used an open-source web platform to manually label the videos frame-by-frame with bounding boxes to track fish movement”**: Rephrased to highlight the process more actively and descriptively, e.g., “The team then employed an open-source web platform to manually annotate this data, meticulously drawing bounding boxes frame-by-frame to track fish movements.”
* **”In total, they labeled 1,435 video clips and annotated 59,850 frames”**: Kept similar as it’s a factual summary, but integrated into the sentences more smoothly.

**Innovative Technology Accurately Quantifies Fish Migrations, Revealing Key Behavioral Patterns**

In a groundbreaking study, researchers have successfully validated a sophisticated computer vision system for monitoring fish migrations. The system’s accuracy was rigorously tested against established methods, including human video analysis, in-person stream-side counts, and data from passive integrated transponder (PIT) tagging. The findings demonstrate that models trained on extensive datasets, encompassing multiple locations and years, achieved the highest precision.

This advanced technology delivers consistent, high-resolution population counts throughout the entire migration season, aligning seamlessly with traditional estimation techniques. Beyond mere enumeration, the system offers invaluable insights into fish migration behaviors, including critical timing and movement patterns. Crucially, these patterns are shown to be intricately linked to environmental conditions.

A recent application of the system to the 2024 Coonamesset River migration provided a detailed picture of river herring activity. The system recorded an impressive 42,510 river herring, revealing a distinct migratory rhythm. Upstream movements were most active during dawn hours, suggesting a preference for the light to navigate. Conversely, downstream migration predominantly occurred under the cover of night. This nocturnal behavior is likely a survival strategy, with fish capitalizing on darker, quieter periods to minimize the risk of predation.

This innovative real-world application by researchers, spearheaded by Zhongqi Chen and his colleagues, is poised to significantly enhance computer vision’s role in fisheries management. Their work establishes a robust framework and best practices for incorporating this technology into conservation initiatives, extending its benefits to a diverse array of aquatic species. As Vincent of MIT Sea Grant highlights, this research builds upon ongoing support for the topic and promises to elevate fish population assessments for both fisheries managers and conservation organizations. Furthermore, it will foster educational opportunities for students, the public, and citizen science groups, bolstering efforts to protect the ecologically and culturally vital river herring populations found along our coastlines.

**Sustained traditional monitoring remains crucial for preserving the integrity of long-term fisheries data as agencies transition to automated counting systems.** Even with advanced technology, human oversight and participation will be indispensable. Computer vision and citizen science should not be viewed as replacements, but rather as powerful collaborators. Volunteers will play a vital role in maintaining the necessary camera infrastructure and contributing directly to the accuracy of automated systems, from labeling video footage to validating model outputs. This integration of human observation and AI-driven data promises a more complete and robust strategy for environmental monitoring.

This work was funded by MIT Sea Grant, with additional support provided by the Northeast Climate Adaptation Science Center, an MIT Abdul Latif Jameel Water and Food Systems seed grant, the AI and Biodiversity Change Global Center (supported by the National Science Foundation and the Natural Sciences and Engineering Research Council of Canada), and the MIT Undergraduate Research Opportunities Program.

Related Articles
A philosophy of work

A philosophy of work

What makes work valuable? Michal Masny, the NC Ethics of Technology Postdoctoral Fellow in the MIT...