Insights /

How to develop a data strategy

How to develop a data strategy

Robert Balaban
Senior Data Engineer

Every company today is interested in processing data as part of its operations. There's simply so much data out there to be harnessed, after all. However, no organization should start working with data out of a simple sense of obligation — rather, companies should launch purposeful, impactful data strategies.

While data strategies have existed in various forms for decades, they have evolved considerably in recent years. In the past, a data strategy involved inflexible daily reports that were difficult to change or act on, but today's options are more interactive. Now, leaders can react to reports in real time and make better business decisions affecting both defensive and offensive strategies.

What factors define data strategies today?

Today's companies face a different landscape than the one that existed even a few years ago. The twin factors shaping data strategies today are:

Equipped with backend systems that are becoming more flexible and self-service friendly, as well as increasingly powerful frontend systems, teams can expand the scope of their data strategies to encompass input that may not have been considered data in the past.

Data strategy overview: what do businesses need to know?

A data strategy should be designed as a continuous process. It should set out the guidelines by which companies collect, process and analyze data.

The first step is identification of the data and subsequent collection. This is where data engineers work closely with business decision-makers to identify key components. 

After collection, the data is analyzed, processed and stored. Once normalized and cataloged, the data is shared with downstream groups, data scientists, ML engineers and others. 

Once shared, data can be stored for archival purposes, building the company's data moat. Data that is collected but found not to be significant can often be destroyed — including, for example, application logs.

Who's in charge of the data strategy?

There should be a C-level leader leading the direction of a company's data strategy. The question is, what will this executive's overall mission be?

Companies previously appointed chief data officers to take the reins of all data management activities. However, because data strategy is inextricable from a business's overall tech efforts, it has become more common to go with a chief digital officer (CDO) instead.

The CDO's role involves bringing input that is not digital in nature into digital space so that it becomes data and can be analyzed for insights. Every interaction between a company and its audience, partners and the world at large is potential fuel for a modern data strategy under the direction of the CDO.

Individual teams, such as marketing and product, can pitch ideas on what data points and key performance indicators (KPIs), they would like to gather and track. This in turn will feed the CDO's decisions as they update the strategy.

How do businesses of different types build their data capacity?

There are a few different approaches to building out data processing capacity. They all reflect the fact that the necessary skills for working with data are highly specialized, calling for dedicated personnel. Generally, a company's size will determine its preferred style:

What's the business value of a data strategy?

What's the purpose of a company's data strategy? While no two businesses are the same, there are a few consistent points of value that come from taking a more systematic and considered approach to every step of data usage.

Data strategy objective no. 1: gathering data to make informed decisions 

Systematic collection and analysis of data can guide strategic decision-making and operational efficiency. By understanding trends, patterns and correlations within data, organizations can make more informed, evidence-based decisions, reducing guesswork and enhancing effectiveness. Three elements of a data-driven business strategy include:

  1. Usage analytics

This process involves interpreting data generated from user interactions with applications or services. The analysis helps in understanding how users engage with digital products, identifying which features or offerings are most popular and determining where improvements can be made. It's crucial for optimizing the user experience and increasing overall application performance.

The result is an easier and faster feedback loop. Analytics can be used to provide faster feedback than is available with surveys or user interviews, and real-time statistics can enable companies to pivot faster.

  1. Customer segmentation and behavior analysis

Companies can divide their audiences into distinct groups based on shared characteristics or behaviors. This segmentation allows for more precisely targeted marketing and product development, ensuring that they address the needs and preferences of different customer groups. This in turn leads to increased satisfaction and loyalty.

  1. A/B testing 

This method is based on comparing two versions of a webpage or app against each other to determine which performs better.  A/B testing provides empirical evidence to guide decision-making as organizations optimize their digital presence.

Data strategy objective no. 2: identifying and building a data moat 

Right now, the competitive edge provided by technology like artificial intelligence (AI), machine learning (ML) and large language models (LLMs) is highly dependent on data that the models are trained on.

Companies with access to superior data can differentiate themselves in the market by offering more precise, reliable and contextually relevant solutions. A data moat is a strategic accumulation of unique, valuable data that competitors cannot easily replicate, providing fuel for these models.

Each company has a specific set of data which gives it a competitive edge, and building a useful data moat means identifying the contents of that data set.

Examples of data moats include:

How it works: modern data moat creation

As AI, ML and LLMs continue to advance, preparing company systems to leverage them becomes crucial. This is more pressing than ever today, when these technologies have become widely available.

One way to implement a data moat is through a retrieval-augmented generation (RAG) framework. This type of framework brings in outside knowledge sources to strengthen the answers generated by a company's LLM.

Retrieval models typically use methods like information retrieval or semantic search techniques to identify the most relevant pieces of information based on a given query. These models excel at finding accurate and specific information, but can't generate creative or novel content.

Following data retrieval, a generative model can generate a concise and coherent response based on the discovered information. By combining these two approaches, retrieval-augmented generation overcomes the two systems' individual limitations. 

Implementing RAG has two main benefits: it ensures that the model has access to the most current, reliable facts and that users have access to the model’s sources, ensuring that its claims can be checked for accuracy and ultimately trusted. 

Data strategy objective no. 3: democratizing data and removing data silos

By democratizing data access and breaking down data silos, organizations enable information to flow freely across departments, enabling more cohesive and informed decision-making.  

The result is a clear overview of a product and overall company health, with free access to a broader sample of information when looking at multiple data points.

What technology powers a data strategy?

Given the volume and complex formats of today's non-relational data, as well as the speed with which it's created, companies have to look beyond legacy technology to enable their data strategies. Cloud computing solutions are need-to-have rather than nice-to-have technology, with their ability to deliver flexible and affordable scale.

Specific tools best suited to various roles include:

Data pipelines (moving data) — Apache Kafka / RabbitMQ

These tools facilitate the rapid and reliable transfer of data between systems. Essential in real-time data sharing, they support high-volume data movement, which is crucial for modern, data-driven architectures.

Data transformation — Apache Spark / Apache Flink / DBT

Focused on converting raw data into actionable insights, these tools process, clean and organize data for effective analysis. 

Data storage — AWS S3 / AWS Redshift / Snowflake / Google BigQuery

The tools in this category are designed for storing large volumes of data in a way that is both scalable and accessible. They support the management and retrieval of data across distributed systems, ensuring data integrity and security. These storage solutions are adaptable to various types of data, from structured to unstructured, and are optimized for high performance levels in big data and cloud environments.

Data visualization — Tableau / Looker / PowerBI

Aimed at transforming complex datasets into understandable visual formats, these tools facilitate the creation of interactive charts and dashboards, making data analysis accessible and insightful.

The importance of data security and disaster recovery

Organizations must ensure that their data pipelines are secure and protected against accidental data exposure, intentional attacks and the damage that can come from natural disasters. Data security and privacy are also inextricable from an overall data strategy, where they represent a major overall objective of any given approach. There must be hard rules about who has access to data and how they can use it. 

1) Disaster recovery:

Today's data protection and disaster recovery strategies have evolved along with data strategies in general. Businesses must have contingencies to handle modern-day threats, such as the risk of their data being locked or lost in a ransomware attack. For more, read our eBook, The Main Challenges in Addressing Ransomware.

2) Data security:

Establishing a top-down approach to security decision-making should occur at the highest level and then be communicated to the rest of the team. This results in clear, well-organized processes that leave little room for confusion. Beyond this foundation, there's no one-stop solution to build a data security strategy. Considerations include:

To keep up with privacy and security regulations, leaders should use the following approaches:

Obfuscation and anonymization techniques
Obfuscation involves altering data so that its original value is masked. Anonymization removes or modifies personal identifiers to prevent the identification of individuals. Both techniques are crucial for protecting sensitive information and ensuring compliance with privacy regulations, especially in scenarios where data needs to be shared or analyzed without compromising individual privacy. 

Data lineage, access patterns and tracing
These methods are concerned with tracking the origin, movement, characteristics and quality of data throughout its lifecycle. Data lineage provides visibility into the data’s journey and transformation, helping personnel understand its structure and dependencies. Access patterns and tracing involve analyzing how data is accessed and used over time, which is essential for ensuring data integrity and enhancing security by identifying unusual or unauthorized access patterns.

What's next for strategic data use?

Data strategies have evolved significantly in the past few decades, and they're not done. As technology and business practices change, companies' approaches to their data will do likewise.

You may feel your business is not making optimal use of its data. In that case, you can take a few approaches: building out your data pipeline's capacity, implementing data strategy best practices, contracting with third-party teams and more.

An engagement with an industry-pacing partner like Transcenda can introduce new levels of performance into your data strategy. Whether you opt for consulting, hands-on co-development of new solutions or something in between, working side-by-side with Transcenda's professionals helps you embrace optimal practices and upskill your team while enhancing your systems.

To push your data strategy forward now and get ready for the future, you can engage with the experts from Transcenda. Contact us now.

Robert Balaban is a Senior Data Engineer at Transcenda. With a strong focus on data pipeline design, optimization and data-driven decision-making, Robert’s focus is transforming raw information into actionable insights.

Subscribe to receive the latest industry insights and news from Transcenda

Related articles:

Subscribe to Transcenda's newsletter

Receive exclusive, expert-driven engineering & design insights to elevate your projects to the next level.

Tom Madzy

This is some text inside of a div block.