Insights

The Right Type of Data Centralization for Your Organization

Data Centralization

By Justin Richie

Data Science Director

Data centralization is a critical component of all analytics initiatives. But choosing the right partner, building a strong team and correctly implementing data centralization is a daunting task for any organization.

Here’s a somber statistic from a Gartner study: 85 percent of analytics projects fail. Why? The success of these projects hinges as much on technical components as the organization’s culture. There are many options for centralizing your data and seemingly an infinite number of partners to address those needs. How can your organization find the right strategy? Find success by taking a deliberate approach to addressing cultural issues then technical challenges.

When you address data centralization cultural barriers, you’ll quickly identify bottlenecks. If teams operate in silos, the hardest issue to address is teams often have manual mission-critical processes. Your first objective of data centralization should be to eliminate manual processes that impede organization efficiency. Once you identify cultural elements, next comes the technical challenges.

Data centralization currently takes two common forms: data warehouses and data lakes. Data lakes are a data centralization mechanism in that you load data in raw, and the tables don’t have strict associations. The most significant benefit is that they don’t require pre-processing, and organizations can still use their existing processes to analyze data. The largest downfall of data lakes is that organizations can sometimes fall into a cycle of “analysis paralysis,” which means you have too much data and you’re not always analyzing the right elements.

Data warehouses are relational databases where every table is connected with strict logic, and every row is unique to itself and connects to other tables. Its most significant benefit is that querying data is much easier, and you have the ability for consistent knowledge around data tables. The largest negative attribute of this data centralization type is that the logic is rigid. Additionally, if you want to change the schema, it takes considerable work.

Many leaders think they have to choose between one or the other, but you can leverage both in parallel. Data lakes are great with pre-processing and their ability to store vast amounts of data. Data warehouses are ideal for predefined logic that powers a business application or analytics programs. Using both can be great long-term, but it's best to start with one and evolve into more mature systems that can benefit from one.

Big data was coined as a killer of the relational database and the data warehouse, but now these two worlds co-exist rather than compete. Using technologies like Hadoop is excellent for petabytes of data, but if no one can query or understand the data, it’s worthless. That’s why big data software vendors are now giving the ability to have its analysts write SQL directly against these big data or NoSQL environments. The only consistent element across all data centralization methods is SQL, and that won’t likely ever go away. Therefore, it’s best to invest in this pivotal technology.

In closing, it's important to have a sound data centralization strategy because the stakes are higher than ever from increasing competition and your ability to grow revenue. Data centralization methods are convoluted, but if an organization looks pragmatically at the technology to solve problems and unlock insights first, the success of these initiatives increases dramatically. If you structure your analytics programs to adapt to your organization's changing culture, you'll experience higher levels of adoption throughout the company. W. Edwards Deming said it perfectly: “Without data, you’re just another person with an opinion.”

Published on 09.26.19