Our evaluation of Online ML, Active Learning and Reinforcement Learning

The description and evaluation on three machine learning techniques that all have some online/interactive component.

5 min readApr 28, 2021

Written by: Kristjan Eljand | Technology Scout

Our digital technology portfolio consists of techniques and concepts that can be implemented by software engineers or data scientists. Each technology is assessed from the perspective of business value (horizontal axis on the chart below) and technological matureness (vertical axis). In the first article, I described our view on business process mining, digital twins and knowledge graphs. This time, I’ll focus on three closely related machine learning techniques: active learning, online learning, and reinforcement learning.

Figure 2. Digital Technology Portfolio of E-Lab. Each point on the chart represents one technology. The expected business value is represented on the horizontal axis and the technological matureness on the vertical axis. Green — technologies that are worth to be tested and demonstrated. Dark blue — wait until the tech becomes more mature or until the expected business value increases.

Online Machine Learning

Online machine learning (online ML) is a technique in which the model is updated as soon as the new data comes in, as opposed to batch learning (e.g. once per day) on historical data.

Online ML is used in environments where the data distribution changes rapidly and only the newest data represents the environment in a correct way. In addition, online ML could be used in environments that has limited data storage capabilities. For example, imagine an IoT device that cannot store or send the data for some reason, but we still want to build a machine learning model on top of it.

Online incremental learning is achieved with algorithms like Stochastic Gradient Descent (SGD) that enables to use only part of the available data for each iteration. The applications include:

Edge ML on the devices with limited storage capabilities.
Product recommendation system that adapts to user preferences on the fly.
Learning to make decisions with Reinforcement learning.

Possible problems with online ML:

Cold start problem — we can only implement algorithm in live production environment after the algorithm has reached to some quality threshold.
Control mechanisms — It is difficult to build control mechanisms to control that the algorithm is learning towards the “right” direction.
Threat of manipulation — online ML models could be intentionally manipulated by the third parties (Recall Microsoft’s bot Tay).

Evaluation

We evaluate that the online machine learning has mediocre business value in the energy sector because most of the machine learning challenges can be solved in offline mode and there is also a high risk of model drift (the learns some patterns that it shouldn’t). Similarly, the matureness of the technology is below average.

Active learning

Active learning is a way of doing machine learning in which the learning algorithm and user are in interaction during the training process. In other words, its Human in-the-loop machine learning.

Active learning is useful in situations where we don’t have enough labelled data or when we suspect that the labels are not fully adequate. Consider the following examples:

Predictive maintenance: pre-trained ML model provides a prediction of device fault -> human expert is asked to confirm or disprove -> ML model is retrained and the feedback from expert is included in training data.
Recommendation system: pre-trained recommendation model provides list of personalized product recommendation in the website -> user is asked to evaluate the quality of the recommendations (1–5 start) -> model is re-trained.

Applications:

Active learning approach can be used in almost every machine learning system. It adds a decent amount of complexity but might be crucial for the success of some projects.

Evaluation

We evaluate that finding valuable business cases for active learning in the energy industry is difficult. This is due to lack of use-cases that require online human-in-the-loop training. That said, using active learning in production environments has lower risk than with pure online learning due to a fact that a human-expert is providing continuous feedback to the model and is able to detect the model drift. The matureness of the technology is over the average.

Reinforcement learning

Reinforcement learning (RL) is about learning how to act in an environment. In other words, RL tries to answer a question: What is the next best action to achieve the goal? For example: what is the next best chess move? Should I turn the wheel of the car right or left? How to move robotic arms to achieve the result? Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning, and unsupervised learning.

RL shines in environments that are too complex to be solved with traditional combinatorial optimization or if the decision-making process needs to be very fast and we can’t allow to run long optimization process. For example, deciding how to drive the car near real time.

One of the core features of RL is that it doesn’t need the labeled data for learning. In other words, RL algorithms learn the right and wrong by interacting with the environment, receiving some sort of reward from it and re-training itself.

One of the main weaknesses of state-of-the-art RL is sample inefficiency — the algorithm needs to run millions of trials to achieve good performance. This problem might be solved with advancements in model-based RL and Offline RL — both areas are being actively researched.

The possible applications include:

Large scale stochastic optimization problems.
Optimizing the pricing of rental products.
Automated planning solutions.
Using RL to optimize the work of HVAC system (Example by Google).
Using RL for autonomous cars.
RL as a brain (decision making system) of robots.

Evaluation

We believe that Reinforcement Learning has very high value potential in the energy sector — the technique has a unique promise of learning how to decisions that will result in high long-term reward. The matureness of the technique is mediocre — some algorithms like DQN are well tested and have tools available but RL’s “online” nature and sample inefficiency are still big obstacles to overcome.

Summary

Online machine learning, Active learning and Reinforcement learning are similar concepts because they all include real-time or interactive component in its training cycle. We believe that active learning has the highest technological matureness (due to a fact that there is human-in-the-loop who can “guide” the learning algorithm in the right direction) and Reinforcement learning has the highest value potential (due to its unique ability to learn optimal long-term decisions). Online learning has an interesting promise of building a continuously up-to-date machine learning model without storing the data (think IoT) but its usability in production setting is far from mature.

ps. Don’t forget to take a look at our Digital Technology Portfolio.

Our evaluation of Online ML, Active Learning and Reinforcement Learning

The description and evaluation on three machine learning techniques that all have some online/interactive component.

Online Machine Learning

Evaluation

Active learning

Evaluation

Reinforcement learning

Evaluation

Summary

Written by Enefit IT