We live in an age where data is being collected at an unbelievable rate. Moreover, machine learning has transformed from a pure academic and research domain to a powerful tool that is being adopted by industries to help drive key decisions. Being very robust in nature and being able to explain a wide range of situations, machine learning models are now being used to solve real-world problems and fuel companies’ innovations. However, many times it is overlooked that machine learning models are fueled themselves by data and do need to be monitored to make sure they are still making predictions as expected.
Does my data make sense?
For decades, the market has seen the need and importance of collecting data grow. Unfortunately, we began collecting data without really knowing what we were going to do with it or how new technologies would need the data formatted. This has led to many having a unique setup on how their data is stored, where it was stored, and their own unique set of obstacles and nuances with their data.
Just like snowflakes, everyone’s data is unique to them. While many insights have painted the picture that we can just dump data into a machine learning algorithm and immediately have a high performing model, this is many times not the case. A great machine learning model is created from great data. This has led to the saying that “companies hire data scientists to learn, they should have hired data engineers” – there is actually some truth to that.
To build that great set of data, it is key to understand one’s data and prepare it for a machine learning model. Many things have to be considered such as:
- What format is the data stored in and does it need to be transformed?
- Does the data have many outliers that might affect the model?
- Is the data imbalanced?
- Are certain features plagued with missing values?
- Are some of the predictors correlated and confounding?
- How long will this data stay relevant?
Most importantly, it has to be asked does my data make sense? Many times, it takes a subject matter expert to verify this and help identify possible nuances in the data. I cannot voice enough that this is by far the most crucial, yet often overlooked step. Once on this journey and the “data monster” is fully brought into the light it can become very daunting.
Machine learning algorithms as far as the eye can see!
While a machine learning model itself is simply an algorithm that learns latent patterns and relationships from our data, it is key to note that there are many types of algorithms from support vector machines, decision trees, random forest algorithms, neural networks, and so many more. While all of these models are very flexible, they all have strengths and weaknesses too. This is where it becomes crucial to point out that data science is just that… a science! It can be very difficult for anyone to know what model is best for them or the level of accuracy that model will produce for that matter. It is only through carefully reviewing data, testing, exploring multiple models, and model tuning can you determine what is truly optimized for your business and needs. Here having a technology partner that is model agnostic and has experience in a multitude of machine learning models can not only accelerate the model creation, but help identify the model that is ideal for the data and application. This not only allows for the best model to be selected, but for the model to be carefully trained on your data allowing for a unique machine learning model specifically designed to tackle your needs and your customer/population base.
While getting a model built and running is a huge accomplishment, we need to be sure that our models are having a return on investment and not just a shiny object sitting around. What I mean by this is it is one thing to have a model that has an extremely high-level accuracy on the data that it was trained on, but how does it act on the population you are going to apply it to? Many times, we see models that work well in theory but fail when applied to the population. For example, a machine learning model is trained to identify individuals with a high risk for a chronic illness and has a very high accuracy in predicting the individuals at risk with the data it was trained on. However, the accuracy falls greatly once it is put into production and starts seeing individuals it has never seen before. It is paramount that a model can generalize its finding to the population. Having someone to consult/assist in building machine learning models specifically for your organization who has experienced your unique situation can help you avoid pitfalls before they happen. Having this expertise will allow models to be rapidly created to help answer key questions, drive results, lessen workloads, and facilitate innovation.
I created my machine learning model, I’m done right?
Once a model is created many new and important questions arise such as:
- How will I deploy the model?
- How will I feed the model data?
- How do I create a model pipeline?
- How do I monitor the model?
To truly maximize your machine learning models that you have spent a great deal of time creating, the model will need to be deployed for all to use. In addition, once a model is created it can be quite dangerous to leave alone. The data that the model was trained on might start to change and once this happens the model predictions might become inaccurate. An example of this can be found in healthcare cost predictions. New medications are constantly being released, in addition to drifting prices of some existing medications. Factors such as these can cause a model’s prediction of healthcare costs to become inaccurate and thus the model will need to be re-trained. This is known as data drift. Having a trusted partner working with you to help identify this early can prevent inaccurate predictions from driving business decisions and providing misleading information.
In addition, the model will use compute resources so monitoring its demand on your system can be very important depending on the number of users. One of the most crucial topics is making sure that this new tool is really being utilized. Humans are creatures of habit and do not like change. How can the model be introduced and encourage your company to utilize it to its full potential? This is where change management is crucial. No matter how good the model is, if no one is using it what is the point of even creating it? Nothing can be more devastating than seeing this. This once again is where having a long-term partner can really help you and your company.
Starting your machine learning journey
Machine learning models are extremely powerful tools that have countless applications and can provide so much to companies. However, it is important to understand that you can’t just wake up one morning and plan on implementing machine learning today in your company and have it done by the end of the week. The machine learning journey is a long road that probably will have some potholes along the way. This is not to deter anyone from taking this journey, it is to inform you what can be expected on this journey. More importantly it is to let anyone know that when embarking on this journey, having a guide or traveling companion will always help.
When looking for a partner to help your company with machine learning, always make sure to do your research and make sure they have expertise to help with every part of your journey. Building a long-term relationship with a partner will allow them to really understand your data, help mature your data, create unique custom models to drive your business, and ensure it is deployed and being utilized to its fullest potential. By having this partner involved from beginning to end you can avoid many bumps and accelerate design and deployment time. Last but not least, if your guide/traveling companion has snacks, that’s a bonus.