A Technical Debt Model in Machine Learning Systems


Machine Learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed. Machine learning algorithms use historical data as input to predict new output values.

Technical debt describes what happens when development teams take conscious steps to expedite the delivery of a feature or project that then needs to be fixed through refactoring. In other words, the result is prioritizing fast delivery over perfect code.

This article will introduce a simple yet powerful technical debt model for machine learning systems. The model is simple to remember, easier to extend, and provides a reliable means for reliable and maintainable machine learning systems. This, in a nutshell, is the value proposition of this position.


The model of an ML system is quite simple and generic. For technical debt, the time dimension is a major consideration since unmanaged debt becomes a debilitating impediment to progressive velocity. ML systems act on input to produce output based on machine learning theory and practice.

A Technical Debt Model in Machine Learning Systems
Generic Machine Learning System Model

From this, we can derive the following expectations for system behavior over time:

  • We expect the system to produce similar output when presented with similar input – no surprises on the way out
  • We expect to modify the system gradually to accept different inputs to produce new outputs according to certain domain knowledge – Incremental refinement
  • We don’t expect to be overwhelmed by the complexity or comprehensibility of the system in our quest to map input to output – we expect the system to be a maintainable System

We say a system is laden with technical debt when any or all of these expectations fail over time as we seek return on investment (ROI) in the system by deploying it into production. The accumulation of suboptimal decisions begins to slow down the system. This phenomenon calls for a systematic and scientific management of the accumulated Technical Debt.

Python Technical Debt: Global Interpreter Lock (GIL)

What could be a very good example of Technical Debt in the world of Machine Learning?

Look no further beyond Python. The Python interpreter has a global interpreter lock (GIL). The GIL is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes simultaneously. This lock is needed mainly because CPython’s memory management is not thread-safe. This hampers concurrency design in Python, resulting in wasted time waiting for the global lock. The situation is serious enough to motivate Microsoft to bring Python BDFL (Benevolent Dictator For Life) Guido Van Rossum out of retirement. In 2018, Python Benevolent Dictator For Life Guido van Rossum retired, but a few years later he changed his mind and went back to work – at Microsoft.

Why is Python GIL technical debt? This is so since this decision was made consciously, which brings us to the next section. This Technical debt threatens Python’s future as processors are inherently multi-core due to power density issues. Python cannot leverage multiple cores efficiently due to GIL.

Technical Debt Register

It is very important to recognize that the act of assuming the Technical Debt should be a very aware a. Here is a sample list of items that we do not consider tech debt:

  • We did not deliberately make design errors as part of the trade-off to speed time to market (impact on scalability, resiliency, user experience, etc.)
  • Coding errors/flaws Inadequate documentation Not knowing the right way(s) to do something
  • Unit test failure UNLESS it was a conscious decision to TEMPORARILY accelerate time to market.

Therefore, it is non-negotiable to require that all Technical Debt be consciously maintained in a Technical Debt Register. Anything that is not present in the register is not taken into consideration for the planning and budgeting exercises.

To sum up: The debt of negligence is not a technical debt. Negligence debt (not keeping up with maintenance, software patches, etc.) is an act of omission – Technology debt is an act of commission we knowingly do in favor of getting to market faster with a plan to deal with later.

The practice of keeping a register of technical debt is neither new nor new. There is already an established practice of keeping a record of architectural decisions – it is known as:

Architectural decision files

An architectural decision (AD) is a software design choice that meets an architecturally significant functional or non-functional requirement. An architecturally significant requirement (ASR) is a requirement that has a measurable effect on the architecture and quality of a software system. An Architectural Decision Record (ADR) captures a single AD, as is often the case when writing personal notes or meeting minutes; the set of ADRs created and maintained in a project constitutes its decision log. This all falls under the theme of Architectural Knowledge Management (AKM).

Technical debt is an architectural decision (just like Python GIL), and it’s no surprise that system development and maintenance best practices require a dedicated technical debt register.


Comments are closed.