I would like to propose GovOps: a term describing a special flavor of DataOps fit for Federal Governance needs. Together with machine learning operations (MLOps), GovOps provides a solid foundation for the evolution of enriched ML applications to improve collaboration, transparency and operational efficiency and thereby, provide more valued citizen services.
Artificial Intelligence applications’ predictive accuracy depends on the quality of the training and test data used during model evaluation. The “Garbage-In Garbage-Out” bane of AI and literally every data science-driven solution can be largely handled via effective DataOps. DataOps orchestrates the trifacta of people, process and technology to work together with sufficient feedback loops for continuous improvement and automation of the repeatable “data operations”.
In the Federal Data Strategy context, creating the domain of GovOps (plus MLOps) can cover the very specific, intricate and overtly complex needs of the strategic and tactial governing, and the underlying evolved decision making processes. This is specifically applicable for “Action 8: Improve Data and Model Resources for AI Research and Development” in the context of “Enhance access to high-quality and fully traceable federal data, models, and computing resources to increase the value of such resources for AI R&D”.
A key component of the GovOps stack should include a Data Catalog application with machine learning capabilities. The built-in ML engine should automate data tagging and metadata management as part of the scalable data curation process. Every action taken by the users – accepting or rejecting a suggested tag – is used to improve the future recommendations; apart from the 100+ parameters used to determine the initial tag recommendation for an attribute in the data asset. Just imagine all the advantages on a petabyte-scale data asset with structured and unstructured data! This bridges the terminology gap that typically occurs between technical terms and the needs of the data stewards and the business users. This increases the data visibility and streamlines the data identification, both at inter- or intra-agency level, for advanced analytics applications such as Machine Learning downstream in the data lifecycle and self-service embedded analytics.
Additionally, an MLOps framework should be defined to curate AI applications’ exploration process in a collaboratorative manner. This should account for and track the common- and differentiation- factors and define a culture of composable and reusable ML solutions with increased value. Reasons for deciding which model(s) is chosen as the “final” solution are equally important to the whole process. A “registry” of different solution explorations has significant advantages, not only in long-term solution development, but also as a repository of lessons learned and support training resources. There is an “art to the science” with ML solution development (as well as any data science project) where multiple data sources and technologies orchestrate to provide a cohesive self-service solution. MLOps and the associated registry keep track of the ML-engineers paints, brushes and creations – aka data, tools and models evaluated.
In summary, machine learning enables machine learning (and more) and GovOps and MLOps enable effective knowledge sharing and workforce development and growth.
*This was shared during Federal Data Strategy Public Forum held on Monday, November 2, 2020 co-sponsored by the Data Coalition and Data Foundation