
We use cookies to ensure you get the best experience. Learn More
Articles
How to build a data-centric framework, including to support AI and ML models
Data + AI SolutionsMany organizations claim to be data-driven or data-centric, using the terms loosely and interchangeably. But the two are not the same and have particular applications.
Yes, data-centricity has a lot to do with architecture – which you will read more about in this article – in establishing a data-centric framework, you’re preparing your organization to take advantage of other emerging technologies that rely on good data (i.e., artificial intelligence, machine learning).
To be truly data-centric, organizations must start with a holistic data framework.
Being data-driven is not the same as data centricity – but it is a critical step on the road toward data centricity. Being data-driven involves making strategic decisions based on your data. Data-centric organizations take that thinking a step further, as they see data as a core, independent asset.
Technologies and systems are built around the data they maintain and amass over time. It’s tightly, thoughtfully managed, and data security is of the utmost importance.
The crucial first step to establishing your data framework is to understand and align to your organization’s overall business objectives, which will help your organization use and interpret data across the enterprise and use it to make business decisions.
For example:
How is the cost of your data infrastructure tied back into business operations? Are there opportunities to conduct a cost-benefit analysis to determine how your data infrastructure aligns with your business goals?
To answer these questions, look to data scientists and architects to provide the infrastructure to measure success and create the landscape for measuring the data strategy.
For data-centric organizations, legal and regulatory guidance is involved in ensuring data is secure and accessible – especially if your organization is accountable to European Union General Data Protection Regulation (GDPR) standards.
A suite of technologies and a strategy for managing data can turn into the Wild West without oversight. That’s where a data governance framework comes in; it’s a set of formalized rules and processes for how data is collected, stored, used, and for how it’s disposed of.
Data governance is essential for technology companies and large organizations that maintain vast amounts of data over several years. Businesses would want to establish a data governance framework that outlines which type of data enters their systems and at which stage throughout the data lifecycle process.
Here are some common components of data governance documentation:
Once the business goals, data strategy and data governance are set, you need the technology to support those functions and to see how those technologies can produce the necessary data to measure success.
Data science involves looking at all systems that can produce the needed data to measure against organizational goals. It allows those extracting the data to evaluate performance but also for predictive analytics to forecast performance and even anticipate trends (or data modeling).
Data science can help:
Though without the systems and tools, data science is impossible, so formal data architecture is essential.
Data architecture:
The process of identifying and designing technologies that can effectively manage data and enable modeling. Like data scientists, data architects will use business goals as the “north star.” They work to ensure the right systems are in place to produce the information that aligns with a company’s goals.
While these two disciplines are often separate, they complement each other. Because of that, it’s becoming more common for the data science and architecture teams to have crossover duties.
With the models and infrastructure in place, now it’s time to present your data in a digestible way. As organizations start their data journey, they often see data as a smattering of indiscernible numbers.
They need to determine which reporting tools and functionalities are available. Is the data presented in the correct format? Should the data visualization be interactive?
It’s fair to say that most stakeholders within an organization are not data scientists or statisticians, so looking at data points on a spreadsheet is not always helpful.
It plots existing data and models and presents them in a more easy-to-understand format through charts, graphs, maps and other illustrative formats.
To make viewing data more relevant for different audiences, scientists and architects will create custom dashboards based on the organization's role and level of access.
The effectiveness of your artificial intelligence and machine learning efforts depends on the effectiveness of your data. When setting up your data-centric AI/ML practice, ensure that exploratory data analysis (EDA) is part of the process. Through this, you can look for correlations between variables, understand your data’s “quirks,” and identify any biases or quality issues that might affect your models. Data scientists can help with the normalization of data labeling and structure, feature scaling, and missing values.
EDA also prompts you to consider the ethical side of your data. Are there privacy concerns or potential biases that could make your AI or ML models unfair or exclusive? It's important to address these ethical implications.
By embracing a data-centric mindset and following these steps, you're setting yourself up for success in AI and ML. Your models will make accurate predictions, provide valuable insights, and drive real transformation.
Data can help organizations grow more quickly and systematically when accurate and properly managed. That’s why 92% of companies who invest in data see a return on the investment.
When data is your most powerful resource, it’s time to adopt a data-centric approach.
https://www.tableau.com/learn/articles/data-visualization
https://sfmagazine.com/post-entry/november-2019-what-is-a-data-science-model/
https://www.stitchdata.com/resources/data-strategy/
https://www.techtarget.com/searchdatamanagement/definition/data-governance
https://www.dataversity.net/what-is-data-governance/
https://datagovernance.com/the-data-governance-basics/adg-data-governance-basics/
https://cloud.google.com/learn/what-is-data-governance
https://tdan.com/the-data-centric-revolution-data-centric-vs-data-driven/20288
https://medium.com/fluree/introduction-to-data-centricity-61d73286b7f2
https://www.newvantage.com/_files/ugd/e5361a_ad5a8b3da8254a71807d2dccdb0844be.pdf
https://www.dataversity.net/data-strategy-vs-data-architecture/
https://hbr.org/2019/02/companies-are-failing-in-their-efforts-to-become-data-driven
https://mitsloan.mit.edu/ideas-made-to-matter/why-its-time-data-centric-artificial-intelligence
https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15