From Machine Learning to Machine TEACHING


Imagine the day when chain store executives can focus on market positioning, what experiences to offer customers to stand out from the competition, how to motivate store staff to pursue qualitative and quantitative goals, and more. The daily decisions on the products to be delivered to the stores, in what quantity and at what precise moment, or what prices to set, will in fact be taken by an intelligent system. Science fiction?

The era of highly complex Autonomous Systems, capable of optimizing management even in the presence of anomalous facts, is closer than you think, as evidenced Project Bonsai for autonomous systems – Microsoft AI, the result of the acquisition of the startup of the same name in 2018, currently in public testing.


From Supervised to Reinforcement Learning

In a complex business system, the traditional Machine Learning (ML) phase would consist in the analysis, by industry experts, of thousands of different situations and in the definition, for each of these, of the decisions considered optimal. These examples, elaborated by an ML system, allow you to create a "brain" (a model of reality with which to make decisions in a future situation). However, this approach cannot work in practice, not only due to the enormous effort of annotating real cases which, moreover, could prove to be biased by rooted habits, prejudices, personal interests ... but also because the market changes too rapidly.

A better strategy is the one I talked about in aKite - Reinforcement Learning in Retail. The objectives are defined and the system is left to search for the most suitable algorithm and its optimization through a continuous exploration of different alternatives, in search of the most effective ones. A sort of Darwinian evolution, based on algorithms resulting from decades of scientific research.

Reinforcement Learning has the advantage of letting people decide on medium-term strategy (such as maximizing margin or market share, desired service level ...) while the daily ones are taken by algorithms that automatically adapt to changes in market.

This approach is suitable for relatively simple problems, such as choosing the best promotion to suggest to each customer, but unfortunately not for very complex areas subject to conflicting interests, such as supply chains. Let's see why.


Complex autonomous systems

Even in Reinforcement Learning (RL), the answer to complexity are the Deep Neural Networks (DNN), algorithms inspired by the working of our brain works that have allowed, for example, to overcome human parity in image and speech recognition. The problem is the very high number of "trials and errors" to be carried out before reaching satisfactory results, so much that scientists often try to improve their algorithms by making them compete non-stop against video games.

As I said a couple of years ago in the article cited “... RL is inspired by the human behaviour in which babies grow up exploring the surrounding world, gradually developing a greater skill in a bottom up way, which is gradually joined by a mental model of the world deep mechanisms in which they are immersed, with a top-down process. The still mysterious fusion between these two approaches is very powerful and is what RL try to replicate..."

In companies, time is money and mistakes can be fatal. The novelty therefore lies in the possibility of instilling in the DNNs some simple rules, constraints and limits that must not be exceeded, with the further advantage of the "explainability" of algorithms which are inherently impenetrable (black-box), improving safety and compliance with laws and regulations that are emerging, especially in Europe.

In Microsoft's Bonsai project, this Top-down teaching through simple instructions in a language called Inkling by business experts, not necessarily programmers, also has other benefits, such as putting some control back into the hands of the business and speeding enormously the learning process. It is like conducting a robot, instructed to find the highest peak through blind explorations of the surrounding area, directly in the chain deemed most promising, avoiding the effort to explore an entire continent.

Other steps forward are the possibility of making these algorithms work in parallel with human decisions, so that they can learn by "watching" as experts do, before "guiding" themselves and also to evaluate the effectiveness of different decision-making strategies on a large number of parallel virtual worlds.



This is an example of how advances in Artificial Intelligence can benefit corporate and environmental sustainability. Producing and delivering, through countless daily micro-decisions, only the goods that are likely to be actually sold, will have beneficial impacts on the planet.