Workshop methods for machine learning
Machine Learning (ML) software is a software system with one or more components that learn from data. This entails engineering a pipeline for the collection and pre-processing of data, the training of an ML model, the deployment of the trained model to perform inference and the software engineering of the encompassing software system that sends new input data to the model to get answers.
- Brainstorm Since the engineering of ML solutions is highly experimental (trying out algorithms and their tuning) it is recommended to use brainstorms to get the input of others in those experiments. What to try next? How to improve our model score?
- Business case exploration It might be necessary to compare a scenario with ML to a scenario without ML. Is it worth the effort or investment to start an ML project?
- Code review The most important code (ML library calls) behaves as a black box. To get the most out of your code reviews it is recommended you chose a talk through or pair programming approach. See the blog of David Tan for some examples of recommended coding practices.
- Decomposition Decomposition of ML components is a known difficult problem. Still, you must be able to break your entire ML solution into smaller components. In engineering, the concept of pipelines is used to break the ML workflow into smaller steps, with each component in the pipeline being responsible for one step in the workflow.
- Gap analysis Gap analysis is another way to compare a scenario with ML to a scenario without ML. Where the business case exploration serves as an economical comparison, the gap analysis helps in understanding the design goals of your solution.
- IT architecture sketching Training ML solutions usually involves a cloud infrastructure to have enough processing power. In some projects you also need to think about the IT architecture needed to deploy, run and maintain the ML solution.
- Multi-criteria decision making An ML project may involve choice about tools, libraries, algorithms. It is a good idea to make these decisions visible by comparing alternatives against criteria.
- Prototyping This is the main goal of the ML ICT project: create a working prototype of your ML solution. For ML prototype this involves some specific techniques like feature engineering, cross-validation, hyper parameter tuning, grid search, data quality checking, data preparation
- Requirements prioritization Both data requirements and user requirements need to be prioritized. In ML projects you typically work in an agile way so this prioritization is then done through sprint backlog planning. It is good practice to first develop a working product with a “dumb” model and minimal dataset and then gradually improve the model complexity and the quality of the input data.
- Root cause analysis If the goal of your ML solution is to solve a problem in the current situation, you first need to determine the root cause of the problem before you can design the proper solution. Root cause analysis can also be used for problems that arise during your ML project.