Machine learning (ML) and AI are still new. While the growth of open source tools and online courses have made them accessible to a large number of users, it’s still hard to get them right. The growing population of novice users accessing these powerful tools means that many projects will fail. 

So what questions should you ask yourself at the beginning to avoid having things go wrong?

Does it really solve a business problem?

Or is it just an opportunity to play with some cool technology? With exciting new technologies like Deep Learning for Computer Vision, the results are so cool that it’s easy to get caught up in the technology itself rather than focusing on the value being provided. 

What is "good enough"?

While we may know what business problem we're tackling, we must also know what results are minimally sufficient. Is it reasonable to expect that we can achieve these results? If we need to get 99.9% accuracy before any value is provided, then perhaps this is not the right problem. 

It’s often the case that we're trying to automate some routine work that is currently being done by people. One often ignored benefit of applying ML to a project is the need to measure performance. As part of this effort, it is essential that we get a baseline from the humans currently executing the process. We should compare against this baseline rather than perfection.

Does the customer have reasonable expectations? Even if the resulting system is accurate enough to provide value, the customer may be expecting more. They may be expecting perfection–they probably won't get it.

Where is it okay to make mistakes?

The most overlooked challenge in applying ML is selecting or designing the learning objectives. It is a very rare case where your model can be perfect. It will make mistakes, but which mistakes? This requires careful design. Unfortunately, this is something people often ignore.

Consider how mistakes are penalized during training. Typically every training data point is considered equal, but of course they are not. Identifying a big credit card fraud, for example, is much more important than identifying a small fraud. So we should probably treat the corresponding data points differently during training. We should probably penalize the model more for missing a big fraud than for missing a small fraud.

These choices have enormous real world effects but are invisible during the train/test process as we typically use the same approach to measure (and so penalize) performance on test data as we do on training data. Getting it wrong won’t be visible until the system is deployed.

Where does the training data come from?

A classic mistake when applying ML is poor alignment of the training data with the deployment scenario. This is a bit technical, but almost all ML algorithms are predicated on the notion that their training data is drawn from the same distribution as the testing data. This can go wrong in a variety of ways.

Training data is often collected in some biased way. For example, in drug discovery research, successful experiments are published and unsuccessful ones are not. This leads to enormous bias in training that has a huge impact on the final performance of the system. This is a subtle issue that occurs frequently. We use the data we have available, but is the data right or just cheap?

Are you overfitting?

Last, but not least, we have overfitting. This is a subtle and deep challenge.

Overfitting happens due to “multiple testing.” Imagine a thousand people playing roulette ten times each. Some of them will win and some of them will lose. Suppose we take the person who won the most money. They could have won a lot. Are they a great roulette player? Or are they just lucky? In this case we know that they are just lucky, but we only know that because we understand roulette. If our pool of people is large enough, we will always find someone who performs really well.

With machine learning we typically have a large pool of possible models. Training involves explicitly or implicitly “testing” those models against the training data and choosing the best. Now, is the best model actually good? Or did it just get lucky? That is the fundamental question of over-fitting. It is well studied and the answer is knowable. However, it does require some expertise to know.

Conclusion

ML and AI technologies are incredibly powerful. However, they can still be hard to apply. Many classic problems (definition of success, clear value proposition) are exacerbated in this context while some challenging new problems arise. The good news is that experience helps a lot. The bad news is that it’s not always enough–ML and AI remain a field that often requires real expertise to achieve the results and business value that this technology can deliver.