摘要： Even in a simple development environment, machines and algorithms are still powered by human intelligence.
No-code, low-code (horizontal) machine learning platforms are useful at scaling data science in an enterprise. Still, as many organizations are now finding out, there are so many ways that data science can go wrong in solving new problems. Zillow experienced billions of dollars in losses buying houses using a flawed data-driven home valuation model. Data-driven human resources technology, especially when based off facial recognition software, has been shown to bias hiring decisions against protected classes.
While automation is a great tool to have in your arsenal, you need to consider the challenges before utilizing a horizontal ML platform. These platforms need to be flexible, configurable, and monitorable to be robust and consistently add value over time. They need to allow data to be weighted flexibly in user-controlled ways and have data visualization tools to detect outliers and contributors to noise. They also need automated model parameters and data drift monitors to alert users to changes. As you can see, we haven’t evolved beyond the point where algorithms outmatch human intelligence.
So, don’t be fooled by AI/ML/low code … you still need people. Let’s take a closer look at the reasons why.
Machines Learn from Humans
Trying to replace human data scientists, domain experts, and engineers with automation is a hit-or-miss proposition which could lead to disaster if applied to mission-critical decision-making systems. Why? Because human beings understand data in ways that automated systems still struggle with.
Humans can differentiate between data errors and just unusual data (e.g. Game/Stop/GME trading in February) and align unusual data patterns with real-world events (e.g. 9/11, COVID, financial crises, elections). We also understand the impact of calendar events such as holidays. Depending on the data used in ML algorithms and the data being predicted, the semantics of the data might be hard for automated learning algorithms to discover. Forcing them to uncover these hidden relationships isn’t necessary if they aren’t hidden to the human operator.
Aside from semantics, the trickiest part of data science is differentiating between statistically good results and useful results. It’s easy to use estimation statistics to convince yourself you have good results or that a new model gives you better results than an old model, when in fact neither model is useful in solving a real-world problem. However, even with valid statistical methodologies, there is still a component to interpreting modeling results that requires human intelligence.
When developing a model, you often run into issues about what model estimation statistics to measure: how to weight them, evaluate them over time, and decide which results are significant. Then there is the whole issue of over testing: If you test too frequently on the same data set, you eventually “learn” your test data, making your test results overly optimistic. Finally, you have to build models and figure out how to put all these statistics together into a simulation methodology that will be achievable in the real world. You also need to consider that just because a machine learning platform has been successfully deployed to solve a specific modeling and prediction problem doesn’t mean that repeating the same process on a different problem in that domain or in a different vertical is going to lead to the same successful outcome.
There are so many choices that need to be made at each step of the data science research, development, and deployment process. You need experienced data scientists for designing experiments, domain experts for understanding boundary conditions and nuances of the data, and production engineers who understand how the models will be deployed in the real world.
Visualization is a Data Science Gem
In addition to weighting and modeling data, data scientists also benefit from visualizing data, a very manual process, and more of an art than a science. Plotting raw data, correlations between data and quantities being predicted, and time-series of coefficients resulting from estimations across time can yield observations that can be fed back into the model construction process.
You might notice a periodicity to data, perhaps a day-of-week effect or an anomalous behavior around holidays. You might detect extreme moves in coefficients that suggest outlier data is not being handled well by your learning algorithms. You might notice different behavior across subsets of your data, suggesting that you might separate out subsets of your data to generate more refined models. Again, self-organizing learning algorithms can be used to try to discover some of these hidden patterns in the data. But a human being might be better equipped to find these patterns, and then feed insights from them back into the model construction process.
Horizontal ML Platforms Need Monitoring
Another important role people play in the deployment of ML-based AI systems is model monitoring. Depending on the kind of model being used, what it is predicting, and how those predictions are being used in production, different aspects of the model need to be monitored so that deviations in behavior are tracked and problems can be anticipated before they lead to degradation in real-world performance.
If models are being retrained on a regular basis using more recent data, it is important to track the consistency of the new data entering the training process with the data previously used. If production tools are being updated with new models trained on more recent data, it is important to verify that the new models are as similar to old models as one might expect, where expectation is model- and task-dependent.
There are clearly enormous benefits to applying automation to a broad set of problems across many industries, but human intelligence is still intrinsic to these developments. You can automate human behavior to a degree and, in controlled environments, replicate the power and performance of their work with no-code, low-code ML-based AI systems. But, in a world where machines are still heavily reliant on humans, never forget the power of people.
若喜歡本文，請關注我們的臉書 Please Like our Facebook Page： Big Data In Finance