After building ML systems at scale—from causal inference for customer lifecycle at Spotify to computer vision for biotech—I've developed opinions about what actually matters for production ML.

What Doesn't Matter (As Much As You Think)

State-of-the-Art Models

That paper with +0.3% accuracy improvement? It probably doesn't matter for your production system. The gap between academic benchmarks and production performance is huge.

In production, you need:

Models that are fast enough
Models that are stable enough
Models that fail gracefully

A "boring" model that works reliably beats a fancy model that's fragile.

Perfect Data

You will never have perfect data. Stop waiting for it.

Production data is messy, inconsistent, and changes over time. Build systems that handle imperfection rather than systems that require perfection.

The Latest Framework

Your production system probably doesn't need to be on the bleeding edge of PyTorch or TensorFlow. Stability and team familiarity matter more than features you won't use.

What Actually Matters

1. Clear Problem Definition

Most ML failures are problem definition failures, not model failures.

Before writing any code, answer:

What decision does this model inform?
What's the cost of false positives vs false negatives?
How will you know if the model is working?
What happens when the model is wrong?

2. Data Quality Monitoring

You model is a function of your data. Monitor the inputs, not just the outputs.

Track:

Feature distributions over time
Missing value rates
Upstream data freshness
Schema changes

When model performance degrades, the cause is almost always data.

3. Evaluation That Matches Reality

Offline metrics (AUC, accuracy, F1) are proxies for what you actually care about. Sometimes they're good proxies, sometimes they're not.

The only metric that matters is business impact. Connect your ML metrics to business metrics as directly as possible.

4. Graceful Degradation

What happens when your model fails? What happens when inference latency spikes? What happens when a feature isn't available?

Design fallback behavior from the start. A simple heuristic that always works beats a complex model that sometimes fails catastrophically.

5. Iteration Speed

The team that can iterate fastest usually wins. This means:

Fast experiment feedback loops
Easy deployment of new model versions
Quick rollback when things break
Minimal manual steps

Invest in infrastructure that speeds up iteration, not infrastructure that's architecturally pure.

The Humble Approach

I've learned to be humble about ML predictions. We're fitting functions to noisy data in complex systems. Uncertainty is the norm, not the exception.

The best production ML systems:

Know what they don't know (calibrated uncertainty)
Defer to humans for edge cases
Surface their reasoning (interpretability)
Fail safely

Practical Advice

If you're building production ML:

**Start simple**: Linear models, gradient boosting, simple neural nets. Add complexity only when needed.

**Instrument everything**: You can't improve what you can't measure.

**Plan for model updates**: Your first model won't be your last. Build for change.

**Document decisions**: Future you (or your replacement) will thank you.

**Build relationships with stakeholders**: ML systems exist to serve business needs. Stay close to the people you're building for.

Working Together

At MuffinLabs, we bring this practitioner's perspective to every engagement. We've made the mistakes so you don't have to.

If you're building production ML and want an experienced partner, [let's talk](/contact).

Production ML: What Actually Matters (A Practitioner's Perspective)