Computer Vision in Biotech: Building Imaging Pipelines That Scale
Eric Garcia, PhD
February 3, 2026 · 8 min read
Biotech imaging presents unique challenges that most CV tutorials don't cover. We're currently building computer vision systems for biological research, and the lessons transfer to any domain dealing with scientific or industrial imaging.
What Makes Biotech CV Different
1. Ground Truth is Expensive
In web-scale CV, you can crowdsource labels. In biotech, labeling requires domain experts (biologists, pathologists) who cost $100+/hour. Every labeled image is precious.
This changes your entire approach:
- Active learning becomes essential—prioritize labeling the most informative samples
- Semi-supervised methods earn their keep
- Transfer learning from adjacent domains matters more
2. Images Are Weird
Biotech images aren't like ImageNet photos:
- **Multi-channel**: Fluorescence microscopy can have 4+ channels, not RGB
- **High dynamic range**: 16-bit images are common
- **Strange artifacts**: Bubbles, focus issues, illumination variations
- **Variable scale**: Same structure looks different at different magnifications
Pre-trained ImageNet models need careful adaptation. Sometimes you need to train from scratch.
3. Interpretability Is Required
When your model flags a sample as "abnormal," a biologist needs to understand why. Black-box predictions aren't acceptable in most biotech contexts.
This means:
- Attention maps and saliency are standard outputs
- Simpler architectures often win over marginally better complex ones
- Uncertainty quantification is non-negotiable
4. Reproducibility Is Everything
In research contexts, you need to reproduce results exactly. This requires:
- Deterministic inference (set all random seeds, use deterministic algorithms)
- Version control for models AND preprocessing code
- Audit trails for every prediction
Pipeline Architecture
Here's the general architecture we've found works well:
``` Raw Images → Quality Check → Preprocessing → Inference → Post-processing → Results + Provenance ↓ ↓ Flagged for Full audit trail re-acquisition (model version, preprocessing params, timestamp) ```
Stage 1: Quality Check
Before running expensive inference, check for:
- Focus quality (variance of Laplacian is a quick proxy)
- Illumination issues (histogram analysis)
- Expected structure present (quick classifier)
Reject bad images early. Don't waste GPU cycles on garbage.
Stage 2: Preprocessing
Standardize images for consistent model input:
- Normalize intensity distributions
- Handle multi-channel appropriately
- Resize/crop to expected input dimensions
- Apply augmentation at inference time (test-time augmentation for better predictions)
Stage 3: Inference
Run your model(s). Key considerations:
- Ensemble multiple models for critical applications
- Output uncertainty estimates (MC dropout, deep ensembles)
- Generate interpretability outputs alongside predictions
Stage 4: Post-processing
Transform raw model outputs into biologically meaningful results:
- Convert segmentation masks to measurements (area, perimeter, intensity)
- Apply domain-specific rules (minimum size thresholds, morphological constraints)
- Aggregate cell-level to well-level to experiment-level
Stage 5: Provenance
Every result gets metadata:
- Which model version made this prediction
- What preprocessing parameters were used
- When was this processed
- What was the raw image hash
This isn't optional overhead—it's a regulatory requirement in many contexts.
Practical Tips
Start with Classical Methods
Before reaching for deep learning, try classical CV approaches:
- Otsu thresholding for segmentation
- Watershed for cell separation
- Template matching for structured objects
Classical methods are interpretable, fast, and often "good enough." Use deep learning when you've proven classical methods don't work.
Invest in Labeling Infrastructure
If you need to label data, build good tooling first:
- Integrated with your pipeline (show context, not just isolated images)
- Support for uncertain labels (biologists should be able to say "not sure")
- Quality control (inter-annotator agreement tracking)
Good labeling infrastructure pays for itself quickly.
Test on Held-Out Experiments
Standard train/test splits don't work well in biology. Images from the same experiment are correlated in ways your model will exploit.
Split by experiment, not by image. If your model works on experiment A's images, will it work on experiment B's images?
The Path Forward
Biotech CV is a maturing field. The tools are getting better, and transfer learning is reducing the need for massive labeled datasets. But it still requires more domain expertise than applying CV to natural images.
If you're building imaging pipelines for biotech or life sciences, [we'd love to hear about your challenges](/contact).

Eric Garcia, PhD
Founder & Principal Consultant
PhD in Machine Learning from UW. Former Spotify ML engineer. 15+ years building production ML systems.
Learn more about Eric →Need Help with Production ML?
We help companies build ML systems that actually work.
Learn About Our ML Consulting