The Data Dictionary: Understanding the Latest Buzzwords in Data Science

Data Protection • Data Storage • AI / ML • IT Modernization  |  January 11, 2023

Information technology has always had a complex lexicon that mixes technology terms, acronyms, and industry jargon. When added to the already robust language of business or government, a technology conversation can quickly sound like a foreign language to the uninitiated.

As a result, we’re already seeing government leaders, such as Congressman Don Beyer, turn to external resources to gain a better understanding of key concepts in data science.

However, the language barrier often serves as a hindrance. Technologists can easily speak together, but the message can often get lost when communicating with others that don’t have a deep AI background. As cybersecurity, cloud computing, artificial intelligence and other technologies continue to transform our world, it is imperative that the government leaders can understand the technical terms thrown their way.

The ability to grasp concepts and allocate a budget for needed projects is essential in all aspects of an organization. To help clarify some of the confusion, we’ve created a dictionary of words and concepts every business leader must know – regardless of their technical expertise.

Data fabric: The foundation of any data-centric problem, the data fabric comprises all of the different types of data that make up an environment. Understanding the makeup of the data fabric is a crucial step in defining the complexity of your data problems. Organizations must connect different data sources from varying technologies to create a cohesive connected and thriving environment for applications and analytics.

DataOps: A set of practices, processes and technologies that manage how data is collected, used and analyzed. A Chief Data Officer (CDO) adds necessary structure and guard rails to the chaos, limiting the unnecessary use of data and ensuring proper data disposal.

Data lineage: Think of this as a data chain of custody. Knowing the data lineage is knowing who has had access to the data, what changes have potentially been made, and the sources. Keeping the lineage of data can help ensure trust and serve as an indicator of data quality.

Digital twins (or data twins): The playground of the data world where organizations can simulate real-world scenarios they may encounter. Digital twins create a digital version of a physical item (think a fighter jet or any equipment/facility with continuously monitoring sensors) that can then be tested for performance in different situations or how modifications may affect overall performance.

Metadata: Additional data that can be computed or inferred about the data and it summarizes the data better while increasing the data’s usability. Metadata can help organizations manage the volume of data, providing a way to search and systematically organize existing content. Leveraging available metadata can help in problem-solving and other areas where information about how the data is collected offers benefits.

Data architecture: Data fabric was already defined as the interconnected data sources in an organization’s ecosystem. Data architecture is how the content available or continuously being generated from that fabric can be put together and aid solving problems using a data-driven methodology.

AI/ML ethics: Artificial Intelligence (AI) and Machine Learning (ML) can provide organizations with new insights into their operations and data usage. Each organization must determine how they intend to use these tools and ensure they do not overstep moral or ethical grounds. For example, an organization could use these tools to improve cybersecurity but also spy on employees, access potentially sensitive data, or use data in other unethical ways. Before using AI and ML, organizational leaders must create a policy surrounding their ethical use.

AI/ML applications: These tools continue to extract new insights from existing data. Leading program types include large language models, entity resolution, object detection, knowledge graphs, graph analytics, geospatial analytics, and time-series analysis among many others. Leveraging these applications can shift what is possible with existing data.

How to Use These Terms

This list serves as just a starting point for better understanding data use. Knowing these terms and having an intelligent conversation around their capabilities is imperative to success.

Data has long been described as the currency of the future. With the proper data accessed correctly, organizations can find new insights to improve almost every aspect of their operation – both strategic long-term or tactical near real-time. However, these insights cannot exist without stakeholder buy-in and technology teams having honest conversations with those in charge of the agency’s vision and mission.

Interested in learning more? Tune into Dr. Nayak’s breakout session: “DataOps for Digital Twins” as part of the Data Team Summit on January 25, 2023.

Dr. Pragyansmita Nayak