Stakeholder bias and why it matters
Diversity in stakeholder feedback is nowhere near as critical to AI/ML system quality as is training dataset diversity, but that does not explain (or excuse) why it is almost universally ignored.
Ironically, a development organization’s (or regulatory agency, or standards body, …) well-intentioned effort to capture and reflect stakeholder priorities can backfire in spectacular fashion when the stakeholder community is diverse and different roles hold materially different views.
Whether the domain is health care, financial services, or security, organizations rely upon stakeholder feedback to guide their general development in much the same way that an untrained AI/ML engine relies upon training (historical) datasets to shape its responses to future input. While diversity is measured along entirely different axes (ethnic, racial, gender, etc. for datasets vs. user, practitioner, regulator, etc. for stakeholders), the insidious, silent risks stemming from unmanaged bias can be just as damaging.
The following analysis of 127 comments to the FDA’s “Proposed Regulatory Framework for Modifications to AI/ML-Based SaMD” illustrates how
Stakeholder roles/communities can be segmented and
How their interest/prioritization of features and risk do indeed vary by those roles.
You can review the raw comments themselves here.
The FDA has published a summary of how they are accommodating this feedback in the just published Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan
I FEEL YOUR PAIN: Anyone who has built any kind of system (AI/ML or not) knows how hard it is to gather meaningful and timely feedback on your work. Any constructive feedback is great feedback and so, given the paucity of good replies, there is a temptation to shy away from any subsequent analysis that might marginalize or diminish that feedback. Still, you do so at your own peril.
DISCLAIMER: The FDA must be praised for their open and transparent process and for publishing the feedback they have received. Nothing written here should be interpreted in any way as critical to their efforts. In fact, the only reason I can use their data at all is because of their aggressively inclusive and collaborative approach to their work.
MORE INFORMATION: Much of this analysis is repurposed from an Appendix attached to CHI’s AI Task Force’s white paper “Machine Learning and Medical Devices: Connecting practice to policy (and back again)” by Sebastian Holst of Qi-fense with contributions by CHI’s Morgan Reed and Brian Scarpelli.
TAKE-AWAY: There are a lot of words here 😊. If you don’t already need convincing, just be sure that you can satisfactorily answer the following questions:
Are you properly weighting the priorities of your stakeholder community?
How well do you understand the various stakeholder roles and account for their inherent bias in your calculations?
Are the most important (vulnerable, influential, …) stakeholders underrepresented? How do you know?
Stakeholder bias in responses to FDA’s Proposed Regulatory Framework
In April of 2019, FDA published the “Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) - Discussion Paper and Request for Feedback.” This paper described the FDA’s foundation for a potential approach to premarket review for artificial intelligence and machine learning-driven software modifications. The discussion paper asked for stakeholder feedback, both generally and specifically in response to eighteen questions it raised on this topic. The paper inspired significant discussion and other activities in this area, and generated hundreds of comments from a wide array of stakeholders through the public docket. The comments are viewable here.
Feedback Collection Background
While there were no constraints placed on the kinds of feedback or questions that could be submitted, the FDA included questions that covered the most important (or perhaps controversial) elements of the proposed TPLC framework.
Questions included in Proposed Regulatory Framework were divided into subtopics.
· How complete is the classification of AI/ML SaMD modifications and will they be effective and helpful?
· Is the GMLP complete? How can the FDA help manufactures incorporate new requirements into their existing QMS systems and practices?
· All feedback to the definitions and implementation details surrounding SPS and ACP. These are entirely new elements to the proposed certification process.
· How can the process of premarket review (review prior to an initial SaMD launch) be better defined and managed?
· How can “real-world” data be captured, analyzed, secured, and weighted throughout this entire process?
· What should the ACP include and how can it be consistently and effectively assessed across manufacturers and SaMDs?
These questions bring to the fore just how potentially disruptive Machine Learning may be in the short-term – and why it is in everyone’s interest to shorten the ML transition into the mainstream.
That being the case, why did 64% if respondents fail to answer even one of the FDA’s questions?
64% of the public responses did not directly reference a single question included in the Framework Proposal.
Looking at the respondents’ own questions and/or their interest (and/or lack of interest) in the FDA’s questions offers insight into how stakeholders outside of the FDA perceive these issues and which of these may be perceived as more (or less) important or controversial.
Respondent industries and corresponding stakeholder community roles
Figure 1 maps the self-identified Industry Categories of 127 respondents to generic Stakeholder Community roles.
Figure 1: Respondent Industry Categories and Stakeholder Roles
Perhaps it is not surprising to learn that the primary stakeholders have the loudest voice (at least by sheer volume), but, given the importance of vendor-neutral, independent “Supranational bodies” in shaping regulations, should they?
The questions embedded inside the FDA’s regulatory framework proposal are calibrated to address the FDA’s priorities, but are those priorities and their relative weighting shared? Figure 2 illustrates the percentage of responses that included specific topics. These topics are grouped into “framework-specific” (that are unique to the proposed regulatory framework) and “mainstream activities” (that are general issues already described relating to the mainstreaming of any disruptive technology).
64% of respondents did not answer any of the 18 questions included in the proposal. Closer inspection of respondents’ comments suggests a difference in emphasis and, perhaps, priority.
Respondents that did answer FDA-specific questions:
1. Were much more likely to comment on the ML SaMD modification categories, the recertification criteria and process, and the description of the TPLC.
2. Consistently raised issues across the mainstream activities of Quality, Risk, Ecosystem (collaboration across roles) and Frameworks (reconciliation with other frameworks).
3. Respondents that did not answer the FDA-specific questions were significantly more likely to focus on software Quality and Risk issues.
4. Regardless of whether the FDA-specific questions were addressed, there was a general concern around the definition and treatment of “Locked” models.
Figure 2: Topic interest of respondents
Respondent priorities by topic
Does a respondent’s stakeholder role as innovator or standards body (versus regulatory agency or consumer) also influence their priorities? If yes, should the dominance of one stakeholder role over all others be factored-in or weighted when considering responses?
Figure 3: percentage of responses across topics by Ecosystem Stakeholder role.
Figure 3 maps the percentage of topics included in responses by Stakeholder role (only three roles had enough responses to be statistically meaningful).
1. Quality, Risk, FDA SaMD modifications and recertification processes received the greatest attention.
2. Generally, Innovators, consumers, practitioners and suppliers responded more consistently with one another as compared to Supranational organization responses.
3. Taken as a group, comments relating to Ecosystem (cross roll collaboration), Frameworks (cross framework reconciliation), and Dictionary (defining common terms and definitions across domains) were a strong, consistent area of concern.
FDA-specific question response
While only 36% of respondents addressed the embedded 18 questions directly, those responses were extensive and, obviously, important to assess.
Figure 4: Count of responses that included commentary for each FDA-embedded question. The questions are segmented by topic. All Respondents are shown alongside the three highest reporting Ecosystem Stakeholder roles.
1. Respondents gave the greatest amount of attention to the questions relating to Good Machine Learning Practices.
2. Relative to the other subtopics, Algorithm Change Protocol received substantially less attention from Innovators, et al than any other subtopic. This gap was not evident in either of the other two Stakeholder roles.
3. The high innovator response volume depressed the relative importance of the ACP subtopic.
Given the close relationship between Supranational Organizations and Government Regulators already discussed and the consensus around the importance of framework and regulatory consistency, should the (apparent) lack of interest from Innovators be discounted?