3 things you need to know before your first real-world ML project

After one month of working on my first real-world ML project, my manager told me:

“Pau, we need to stop the project. This is not going anywhere.”

I felt like a failure.

However, with time I understood the 3 crucial mistakes I made, and transformed them into 3 learnings.

💎 Tip 1: Estimate the Return of Investment (ROI) of the project first

Many ML projects fail because no one (especially you, the technical guy) hasn’t thought about what “success” looks like.

You need buy-in from your manager.

You want her to see how much better the company will be once the system is working in production.

For that, you need to *estimate* the impact your system will have on key business metrics, assuming your ML model has decent performance.

Developing an ML solution is expensive, both in terms of human time and infrastructure costs.

And, like in any project, you will go through ups and downs.

If your manager cannot see “what success looks like” during a down period, she will stop the project.

💎 Tip 2: Take full control over the data labeling

ML projects fail not because of bad models, but because of bad data.

Fixing a model is fast and inexpensive. Fixing a dataset is slow and expensive.

If the data needs to be collected and annotated (by humans) take FULL control over the process.

Tools like Snorkel are your best friend at this stage.

Remember, your final model can only be as good as your data.

💎 Tip 3: Do not work alone

ML projects are way more than just ML models.

2 people you wanna have close at all times are:

→ a data engineer, that builds and automates data pipelines.

→ a DevOps engineer, that puts your models in production.

You can’t do it alone. Believe me.

To sum up

  • Estimate the ROI of the project first.

  • Take full control over the data labeling.

  • Do not work alone.

Let’s connect

Wanna be up to date with all the content I share? Join my email list👇🏽