Pigeons (Columba livia) as Trainable Observers of Pathology and Radiology Breast Cancer Images

Richard M. Levenson, Elizabeth A. Krupinski, Victor M. Navarro, Edward A. Wasserman
PLOS ONE
Department of Pathology and Laboratory Medicine, University of California Davis Medical Center

First Page Preview

First page preview

Table of Contents

Overall Summary

Study Background and Main Findings

This research explored the potential of using pigeons (Columba livia) as 'surrogate observers' for evaluating medical images, a novel approach motivated by the high cost and time investment required for human expert validation. The study investigated whether pigeons, through operant conditioning with food rewards, could learn to discriminate between benign and malignant examples in histopathology and radiology images. The researchers addressed four key questions: the pigeons' basic trainability, their ability to generalize beyond memorization, their performance limits on difficult tasks, and the practical utility of their skills.

The study involved a series of experiments using a custom-built operant conditioning chamber where pigeons interacted with a touchscreen. In Experiment 1, pigeons were trained to classify breast histopathology images at different magnifications. To assess generalization, they were tested on novel image sets they hadn't seen during training. Experiments 2 and 3 focused on radiology tasks: detecting microcalcifications and classifying mammographic masses, respectively. The researchers also manipulated image properties, such as color, luminance, and compression, to investigate the visual cues used by the pigeons.

The results showed that pigeons could successfully learn to classify histopathology images with high accuracy (around 85%) and, importantly, generalize this skill to new images. A novel 'flock-sourcing' method, combining judgments from multiple pigeons, achieved even higher accuracy (99%). The pigeons also performed well in detecting microcalcifications on mammograms. However, they struggled with the more complex task of classifying mammographic masses, demonstrating an inability to generalize beyond the training set. Manipulating image properties revealed that color and luminance aided performance but weren't essential, and that pigeons could adapt to compressed images with further training.

The researchers concluded that pigeons can serve as a viable model for studying certain aspects of medical image perception, particularly for tasks involving visual discrimination. Their successes and failures mirrored the relative difficulty of these tasks for humans, suggesting that pigeons could be a cost-effective and controllable alternative to human observers for evaluating image quality and the impact of image processing techniques. The study also highlighted the potential of 'flock-sourcing' as a method for enhancing diagnostic accuracy. However, further research is needed to understand the specific visual features and strategies used by the pigeons.

Research Impact and Future Directions

This study demonstrates the remarkable ability of pigeons to learn complex visual discriminations in medical images, achieving high accuracy in histopathology and microcalcification detection. The research is notable not only for its novelty but also for its rigorous methodology, including controls for memorization and systematic manipulation of image parameters. The 'flock-sourcing' approach, where the judgments of multiple birds are combined, is particularly innovative and yielded impressive results, highlighting the potential of collective intelligence even in relatively simple animal models.

While the pigeons' failure to generalize on the mammogram mass classification task underscores the limits of their perceptual abilities, this limitation actually strengthens the model's validity. By mirroring the challenges faced by human experts, the pigeons provide a valuable tool for understanding the perceptual demands of medical image interpretation. The study's findings have practical implications for image quality assessment, potentially offering a cost-effective and controllable alternative to human observers for evaluating the impact of image processing techniques and display parameters.

The study's primary limitation lies in its focus on visual discrimination. While pigeons can clearly learn to distinguish between image categories, the study doesn't reveal how they achieve this. Further research is needed to understand the specific visual features and strategies the pigeons use. Exploring these mechanisms would not only deepen our understanding of avian visual processing but could also provide valuable insights for developing more effective training methods for human experts and for improving the design of computer-aided diagnostic tools. Despite this limitation, the study's innovative approach and rigorous methodology establish a strong foundation for future research in comparative visual cognition and its application to medical imaging.

Critical Analysis and Recommendations

Comprehensive and Balanced Summary (written-content)
The abstract provides a comprehensive overview of the study's key elements, including the novel use of pigeons, their successes and failures on different tasks, and the broader implications. This clear and concise summary effectively communicates the research's significance to a broad audience.
Section: Abstract
Connect Findings to Computational Models (written-content)
The abstract could be strengthened by explicitly connecting the pigeon model to the development and validation of computational models in medical imaging. This would enhance the paper's relevance to a key area of current research.
Section: Abstract
Clear Problem Framing and Motivation (written-content)
The introduction effectively frames the problem of human expertise in medical imaging being expensive and time-consuming, motivating the need for alternative approaches. This clear problem statement immediately establishes the study's relevance.
Section: Introduction
Frame Pigeons as Benchmark for AI (written-content)
The introduction could be improved by explicitly positioning the pigeon model as a biological benchmark for AI in medical imaging. This would strengthen the paper's connection to a major contemporary challenge.
Section: Introduction
Detailed and Reproducible Training Regimen (written-content)
The detailed description of the operant conditioning protocol, including trial structure and reinforcement schedules, ensures transparency and reproducibility, which are essential for scientific rigor.
Section: Materials and Methods
Quantify Manual Image Adjustments (written-content)
The methods section lacks quantitative details about manual image adjustments. Specifying target parameters for brightness and contrast would improve reproducibility and eliminate subjectivity.
Section: Materials and Methods
Visualizing Histopathology Stimuli (graphical-figure)
Figure 2 effectively presents the histopathology stimuli, allowing readers to visually assess the discrimination task. However, adding annotations to highlight key diagnostic features would improve clarity for non-experts.
Section: Materials and Methods
Visualizing Mass Classification Difficulty (graphical-figure)
Figure 5 effectively conveys the difficulty of the mass classification task. However, the lack of annotations makes it challenging for non-experts to discern the subtle features that distinguish benign from malignant masses.
Section: Materials and Methods
Evidence for Generalization (written-content)
The results clearly demonstrate the pigeons' ability to generalize learned concepts to novel histopathology images, providing strong evidence for true learning rather than memorization. This generalization is a key finding that supports the study's main claims.
Section: Results
Visualize Flock-Sourcing Dynamics (written-content)
The 'flock-sourcing' analysis revealed significantly improved accuracy, but the mechanism remains unclear. A more granular analysis visualizing how individual errors are corrected in the group would strengthen this novel finding.
Section: Results
Coherent Synthesis of Results (written-content)
The discussion effectively synthesizes the findings from multiple experiments, providing a cohesive narrative that integrates successes and failures. This comprehensive interpretation strengthens the study's overall impact.
Section: Discussion
Frame Pigeons as a Dynamic Benchmark for AI (written-content)
The discussion could be enhanced by explicitly proposing the pigeon model as a dynamic benchmark for AI development in medical imaging. This would highlight the model's potential beyond a source of inspiration.
Section: Discussion

Section Analysis

Abstract

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Key Aspects

Strengths

Suggestions for Improvement

Materials and Methods

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Fig 1. The pigeons' training environment. The operant conditioning chamber was...
Full Caption

Fig 1. The pigeons' training environment. The operant conditioning chamber was equipped with a food pellet dispenser, and a touch-sensitive screen upon which the medical image (center) and choice buttons (blue and yellow rectangles) were presented.

Figure/Table Image (Page 4)
Fig 1. The pigeons' training environment. The operant conditioning chamber was equipped with a food pellet dispenser, and a touch-sensitive screen upon which the medical image (center) and choice buttons (blue and yellow rectangles) were presented.
First Reference in Text
The chambers (shown in Fig 1) measured 36 cm × 36 cm x 41 cm and were located in a dark room with continuous white noise played during sessions.
Description
  • Image Content and Organization: The figure presents a grid of 18 histopathology images, which are microscopic views of tissue samples. These are specifically from human breast tissue specimens that have been categorized as either 'benign' (non-cancerous) on the left or 'malignant' (cancerous) on the right.
  • Hematoxylin and Eosin (H&E) Staining: All specimens are stained with hematoxylin and eosin (H&E), a standard staining method in pathology. Hematoxylin stains cell nuclei a purplish-blue color, highlighting the cell's control center, while eosin stains other structures like cytoplasm and connective tissue in various shades of pink. This color contrast makes the tissue's architecture and cellular details visible.
  • Multiple Magnification Levels: The images are shown at three different levels of magnification, arranged in rows: 4x (low power), 10x (medium power), and 20x (high power). This progression is analogous to zooming in with a camera, moving from a wide overview of the tissue landscape (4x) to a more detailed view of individual cell groups (20x). The caption notes that this sequence matches the order in which pigeons were trained.
  • Visual Characteristics of Benign vs. Malignant Tissue: Visually, the benign samples generally show more organized and well-defined structures, such as circular ducts and lobules, with more pink-staining space between them. In contrast, the malignant samples often appear more chaotic and densely packed with dark purple-staining cells, reflecting the uncontrolled cell growth characteristic of cancer. These visual differences form the basis of the discrimination task for the pigeons.
Scientific Validity
  • ✅ The figure provides crucial insight into the experimental stimuli.: Displaying examples of the actual visual stimuli is a critical component of a methods section for a visual perception study. This figure allows the reader to directly assess the nature and potential difficulty of the discrimination task, which is essential for interpreting the study's results.
  • ✅ The use of varying magnifications represents a robust experimental design.: The experimental design of training pigeons across multiple magnifications (4x, 10x, 20x) is a methodological strength. It tests whether the animals can learn to identify pathological features at different spatial scales, which mirrors a key skill used by human pathologists and adds a layer of complexity and relevance to the study.
  • 💡 The representativeness of the selected examples is not defined.: The text states these are 'representative' images. However, without information on the selection criteria, there is a potential for selection bias. Were these images chosen because they are particularly clear-cut examples, or do they reflect the average difficulty of the entire stimulus set? Acknowledging the difficulty level of these specific examples would strengthen the transparency of the methods.
  • 💡 The images lack scale bars for absolute size reference.: For scientific rigor in publishing microscopy images, a scale bar is standard practice. While magnification levels are provided, they are relative and can be affected by display size. A scale bar (e.g., 100 µm) would provide an absolute, objective measure of size within each image, which is more informative and aids in reproducibility.
Communication
  • ✅ The figure's grid layout is highly effective for comparison.: The grid layout is exceptionally clear and well-organized. By arranging the images by condition (Benign vs. Malignant) in columns and by magnification in rows, the figure allows for easy and intuitive visual comparison between the categories at each level of detail.
  • ✅ The caption is highly informative and enhances the figure's self-sufficiency.: The caption is comprehensive and makes the figure largely self-contained. It clearly identifies the tissue type, staining method, image categories, and the training sequence corresponding to the different magnifications shown. This allows readers to understand the stimuli and the experimental progression without needing to search the main text.
  • ✅ The labeling is clear and effective.: The labels for rows ('4x', '10x', '20x') and columns ('Benign samples', 'Malignant samples') are clear, legible, and appropriately placed, which is crucial for the figure's interpretability.
  • 💡 Annotating key diagnostic features would improve clarity for a broader audience.: While the images are illustrative, their educational value could be enhanced for a non-expert audience. The key visual differences that define benign versus malignant tissue (e.g., organized ductal structures vs. disorganized sheets of cells) are subtle. Suggest adding annotations like arrows or outlines to a few key examples to highlight these discriminating features, which would clarify the visual challenge presented to the pigeons.
Fig 2. Examples of benign (left) and malignant (right) breast specimens stained...
Full Caption

Fig 2. Examples of benign (left) and malignant (right) breast specimens stained with hematoxylin and eosin, at different magnifications. Pigeons were initially trained and tested with samples at 4x magnification (top row), and then were subsequently transitioned to samples at 10x magnification (center row) and 20x magnification (bottom row).

Figure/Table Image (Page 5)
Fig 2. Examples of benign (left) and malignant (right) breast specimens stained with hematoxylin and eosin, at different magnifications. Pigeons were initially trained and tested with samples at 4x magnification (top row), and then were subsequently transitioned to samples at 10x magnification (center row) and 20x magnification (bottom row).
First Reference in Text
See Fig 2 for a representative sample of images displayed to the birds.
Description
  • Image Content and Organization: The figure presents a grid of 18 histopathology images, which are microscopic views of tissue samples. These are specifically from human breast tissue specimens that have been categorized as either 'benign' (non-cancerous) on the left or 'malignant' (cancerous) on the right.
  • Hematoxylin and Eosin (H&E) Staining: All specimens are stained with hematoxylin and eosin (H&E), a standard staining method in pathology. Hematoxylin stains cell nuclei a purplish-blue color, highlighting the cell's control center, while eosin stains other structures like cytoplasm and connective tissue in various shades of pink. This color contrast makes the tissue's architecture and cellular details visible.
  • Multiple Magnification Levels: The images are shown at three different levels of magnification, arranged in rows: 4x (low power), 10x (medium power), and 20x (high power). This progression is analogous to zooming in with a camera, moving from a wide overview of the tissue landscape (4x) to a more detailed view of individual cell groups (20x). The caption notes that this sequence matches the order in which pigeons were trained.
  • Visual Characteristics of Benign vs. Malignant Tissue: Visually, the benign samples generally show more organized and well-defined structures, such as circular ducts and lobules, with more pink-staining space between them. In contrast, the malignant samples often appear more chaotic and densely packed with dark purple-staining cells, reflecting the uncontrolled cell growth characteristic of cancer. These visual differences form the basis of the discrimination task for the pigeons.
Scientific Validity
  • ✅ The figure provides crucial insight into the experimental stimuli.: Displaying examples of the actual visual stimuli is a critical component of a methods section for a visual perception study. This figure allows the reader to directly assess the nature and potential difficulty of the discrimination task, which is essential for interpreting the study's results.
  • ✅ The use of varying magnifications represents a robust experimental design.: The experimental design of training pigeons across multiple magnifications (4x, 10x, 20x) is a methodological strength. It tests whether the animals can learn to identify pathological features at different spatial scales, which mirrors a key skill used by human pathologists and adds a layer of complexity and relevance to the study.
  • 💡 The representativeness of the selected examples is not defined.: The text states these are 'representative' images. However, without information on the selection criteria, there is a potential for selection bias. Were these images chosen because they are particularly clear-cut examples, or do they reflect the average difficulty of the entire stimulus set? Acknowledging the difficulty level of these specific examples would strengthen the transparency of the methods.
  • 💡 The images lack scale bars for absolute size reference.: For scientific rigor in publishing microscopy images, a scale bar is standard practice. While magnification levels are provided, they are relative and can be affected by display size. A scale bar (e.g., 100 µm) would provide an absolute, objective measure of size within each image, which is more informative and aids in reproducibility.
Communication
  • ✅ The figure's grid layout is highly effective for comparison.: The grid layout is exceptionally clear and well-organized. By arranging the images by condition (Benign vs. Malignant) in columns and by magnification in rows, the figure allows for easy and intuitive visual comparison between the categories at each level of detail.
  • ✅ The caption is highly informative and enhances the figure's self-sufficiency.: The caption is comprehensive and makes the figure largely self-contained. It clearly identifies the tissue type, staining method, image categories, and the training sequence corresponding to the different magnifications shown. This allows readers to understand the stimuli and the experimental progression without needing to search the main text.
  • ✅ The labeling is clear and effective.: The labels for rows ('4x', '10x', '20x') and columns ('Benign samples', 'Malignant samples') are clear, legible, and appropriately placed, which is crucial for the figure's interpretability.
  • 💡 Annotating key diagnostic features would improve clarity for a broader audience.: While the images are illustrative, their educational value could be enhanced for a non-expert audience. The key visual differences that define benign versus malignant tissue (e.g., organized ductal structures vs. disorganized sheets of cells) are subtle. Suggest adding annotations like arrows or outlines to a few key examples to highlight these discriminating features, which would clarify the visual challenge presented to the pigeons.
Fig 3. Monochrome images with equated hue and brightness, at different levels...
Full Caption

Fig 3. Monochrome images with equated hue and brightness, at different levels of compression. The original images at 10x magnification were converted to grayscale, colored with a single hue, and had their overall brightness and contrast equalized as closely as possible.

Figure/Table Image (Page 7)
Fig 3. Monochrome images with equated hue and brightness, at different levels of compression. The original images at 10x magnification were converted to grayscale, colored with a single hue, and had their overall brightness and contrast equalized as closely as possible.
First Reference in Text
Monochrome stimuli. The 10x stimuli at 0° were used, but were converted to monochrome and equated in hue and brightness to eliminate those image properties as variables (see Fig 3, top row, for representative images).
Description
  • Monochrome and Equalized Images: This figure displays a grid of histopathology images that have been digitally manipulated to test which visual cues pigeons use for classification. The original 10x magnification color images were first converted to monochrome (single color) by making them grayscale and then applying a uniform purplish hue. This process, known as pseudocoloring, removes color differences as a variable. The caption states that brightness and contrast were also adjusted to be as similar as possible across all images.
  • Levels of Image Compression: The figure's main purpose is to show the effects of image compression, a method for reducing a digital file's size, which can degrade image quality. The rows represent three different levels of this compression. The top row, labeled '1:1', shows the baseline uncompressed images. The middle row ('15:1') and bottom row ('27:1') show the same images after being compressed to be 15 and 27 times smaller, respectively. This compression introduces visible distortions, known as artifacts, such as blockiness and a loss of fine detail, which are more severe in the bottom row.
  • Benign vs. Malignant Comparison: Similar to the previous figure, the images are separated into columns of 'Benign samples' (non-cancerous) and 'Malignant samples' (cancerous). This layout allows for a side-by-side comparison to see how the features that distinguish these two conditions are affected by the removal of color cues and the introduction of compression artifacts.
Scientific Validity
  • ✅ The image manipulation represents a robust experimental control.: The systematic removal of color and normalization of brightness/contrast is a strong experimental control. This manipulation allows the researchers to isolate the importance of morphological and textural information for the discrimination task, providing a more rigorous test of what the pigeons are actually learning.
  • ✅ The investigation of compression artifacts adds practical relevance to the study.: Testing the effect of image compression is highly relevant to the field of digital pathology, where managing large file sizes is a practical challenge. By assessing how performance changes with compressed images, the study explores the practical utility of using pigeons as 'surrogate observers' for tasks involving real-world image quality issues.
  • 💡 The subjective description of image equalization lacks quantitative support.: The caption describes the brightness and contrast equalization as being done 'as closely as possible,' which is a subjective statement. For greater methodological rigor, the authors should provide quantitative data (e.g., mean luminance, pixel intensity standard deviation) for the benign and malignant image sets to objectively demonstrate how successful the equalization process was.
  • 💡 The images are missing standard scale bars for absolute size reference.: As with previous figures, these microscopy images lack scale bars. While the 10x magnification is stated, an absolute scale bar (e.g., in micrometers) is the standard for scientific publication. It would provide an objective measure of the size of cellular structures and help in assessing the impact of compression on features of a specific size.
Communication
  • ✅ The figure's organization effectively communicates the experimental variables.: The grid layout is highly effective. By organizing images by diagnosis (columns) and compression level (rows), the figure allows for an intuitive and direct comparison of how compression artifacts affect the visibility of features in both benign and malignant tissues.
  • ✅ The visualization of compression artifacts is clear and impactful.: The figure successfully visualizes the abstract concept of image compression. The progressive degradation of image quality from the top row (uncompressed) to the bottom row (heavily compressed) is immediately obvious, clearly illustrating the visual challenge being tested.
  • ✅ The labeling is clear and effective.: The labels for the rows ('1:1', '15:1', '27:1') and columns ('Benign samples', 'Malignant samples') are clear and well-placed. The caption provides the necessary context to understand these labels.
  • 💡 The process of 'equalization' could be more quantitatively described or visualized.: The caption states that brightness and contrast were 'equalized as closely as possible'. This is a subjective description. To improve clarity and rigor, it would be beneficial to add a supplementary figure or data showing the luminance histograms for the benign and malignant image sets to quantitatively demonstrate the degree of equalization achieved.
Fig 4. Mammograms with the absence (left) and with presence (right) of...
Full Caption

Fig 4. Mammograms with the absence (left) and with presence (right) of microcalcifications. Yellow circles denote where microcalcifications are located.

Figure/Table Image (Page 9)