Batch vs. Online Learning

Machine learning algorithms can be classified into batch or online methods by whether or not the algorithms can learn incrementally as new data arrive.

Batch Learning

Batch learning methods are not capable of learning incrementally. They typically construct models using the full training set, which are then placed into production. If we want batch learning algorithms to learn from new data as it arrives, we must construct a new model from scratch on the full training set and the new data. This is also called offline learning. If the amount of data is huge, training on the full data may incur a high cost of computing resources (i.e., CPU, memory, storage, disk I/O, etc.).

If our system does not need to adapt to rapidly changing data, then the batch learning approach may be good enough. If we do not need to update our model very often, we benefit from the advantages of the batch learning approach. i.e. the whole process of training, evaluation and testing is very simple and straightforward and often leads to better results than online methods. I have developed batch learning algorithms for knowledge graph embedding projects. For my future work, I am also planning to develop an online method for these projects to adapt to dynamically changing knowledge graphs.

Online Learning

An online learning algorithm trains a model incrementally from a stream of incoming data. Generally, online methods are fast and cheap, and execute with constant (or at least sub-linear) time and space complexity. Hence they usually do not require a lot of computing resources. Online algorithms achieve this because they do not need the full data to train a model – once they have learned from new data, it can typically be discarded. I have developed an online method for the prediction of user’s future actions. These user actions happen very frequently with very short time gaps between actions. Online learning algorithms are more appropriate than batch learning methods for this task.