Machine Learning and Technical Debt
In dance there is a leader and follower. The leader is responsible for guiding the couple and transitioning to different dance steps and, in improvised dances, for choosing the dance steps to perform. (Wikipedia)
In traditional software development, source code has always been the leader and data the follower – transitioning systems behavior and, when faced with unexpected events, choosing which exception steps to perform.
Machine Learning (ML) challenges the traditional roles of source code and data and turns them on their metaphorical heads. It is training data rather than source code that shapes ML application behavior. In much the same way that a sculptor creates a mold around an original object, the ML developer creates a functioning ML machine around their training data set. The training data set may be constructed by the developer, but the training (computational analysis and resulting modifications to the untrained machine) is executed without developer intervention. (Machine Learning and Medical Devices Connecting practice to policy: pg 4)
The training data set has stepped in and replaced source code as “the lead” in what can only be characterized as a wholly new development dance.
This is not news. ML practitioners are expert in this new choreography and there is no shortage of weighty initiatives underway to better govern and manage ML-centered development lifecycles, DevOps, and quality metrics. …but what about technical debt?
Measuring software technical debt and balancing the tradeoffs between living with that debt versus going back and refactoring code continues to be a major challenge for development organizations.
To be effective, training data sets must present a balanced (complete) range of actions and outcomes in order to ensure that the resulting trained engine can reliably predict future outcomes. How can an ML engine predict heart attacks for the general population if the training data does not include women?
While there are plenty of Good Machine Learning Practices (GMLP) to guard against bias or blind spots during the initial development process, there is a paucity of options to detect and measure a subsequent behavioral shift that pushes your earlier balanced data set off center. If our heart attack training data accurately reflected population age range initially as between 20 and 30 years, how many years would need to pass before the aging population’s health could no longer be modeled by their 20-30 year selves? Predictions would start out perfectly accurate and maybe there would not be too much of an issue as the population pushed up to 25 – 35, but what about when the community is between 65 and 75? At some point, the gap between the original data set used for training and the current conditions in the field would be too great.
In the aging example above, the “aging” is literally tied to aging; a relatively slow process (unless you’ve already reached old-age but that’s another topic for Medium or FB or something 😊) – what happens if the outside shift is sudden and perhaps completely unexpected – like a pandemic.
How do you determine the cost of refactoring a training data set or the cost of sticking with it as it is?
I first wrote about this problem in April 2020 COVID-19 and Machine Learning Technical Debt. I had trained a model using a decade’s worth of data from a yoga studio to predict which of their current students would be most likely to cancel their auto-pay monthly subscription. The immediate issue was that the pandemic forced radical changes in both the studio’s ability to deliver services and in yoga students’ ability to consume them. This was obvious – everything was on hold. The big question that was left to be answered (and is still unanswered) is would studios or students EVER behave as if the pandemic had never occurred? Would the residual trauma/experience of the pandemic cause lasting behavioral change that would weaken – or perhaps invalidate – the “completeness” of my original 10 years of yoga studio data?
If you look at some of today’s headlines, it’s clear that yoga studio loyalty isn’t even the tip of the tip of the iceberg. Our training data sets will obsolete in direct proportion to how quickly we advance!
Workplace behavior: Microsoft Study: Hybrid Work May Lead to Digital Overload
Ironically, it is likely that ML applications will trigger changes in behavioral norms that will in turn obsolete the very same training data sets that made those first ML solutions work!
UP NEXT: What are the implications for data cleansing? Real-time data?
Why can't all this just be solved through continuous learning?