What is data enrichment?
In order to make data-driven decisions, you need data you can rely on. Data enrichment is the process of cleaning up the raw data and enriching it with information from disparate data (which comes from both internal and external systems) to create one accurate, error-free data set, with proper tags and codes, to be consumed as a single source of truth.
The challenges of enriching marketing data
As with other steps of the data engineering process, enrichment for marketing data poses unique challenges, including:
- Large number of variables: data structure changes significantly over time; for example, survey data might have a different number of columns each month, with different types of data in each column from month to month
- Increasing number of data providers, each with unique formats and metrics
- Velocity of data: 24/7 social media, streaming data, and other sources make it difficult to keep up
- Complexity of data: from weighted surveys to inconsistent brand names, marketing data requires expert attention for accurate enrichment
A two-step approach to data enrichment
1. Examine and correct the data structure and format
Review and, if necessary, fix the following:
- Data structure
- Data format
- File names
- Labels (digital data, survey data, etc.)
This step also includes validation: a critical operation that helps ensure consistency within the data, eliminate duplicative data, and identify missing information.
2. Examine and correct the data itself
A closer look at the data should include numerous automatic and manual checks, such as:
- Comparative benchmarks: determine if you have all of the data you expected, and look for variations from historic data. For example, are you seeing normal seasonal increases and decreases? Are trends changing direction unexpectedly?
- Anomaly and outlier detection: look for data that seems out of place and determine if it’s accurate, or if it’s an error due to missing data, wrong tags, or some other issue
- Fuzzy matching algorithms: help ensure information is consistent across data sets by identifying possible matches for records, including recommending matches for records that are most likely to be missed with a normal lookup
- Recoding: ensure that language and formats are consistent throughout the data (e.g., “NewYork” vs. “New York” vs. “NY”)
- Search for missing values: for example, do you have data for all 50 states
Once these steps are in place, the enrichment process is mostly automatic (including automatic re-ingestion of data if needed), and typically lets you enrich data in a few hours, instead of days or even weeks without these steps.
The risks of not doing data enrichment
Skipping data enrichment typically results in errors and missing data throughout your data set, which can then lead to major flaws in your analysis. If you don’t enrich your data, you will also spend significantly more time going back to all of your source files each time you need to answer a question.
Best practices in enrichment
Machine Learning and Artificial Intelligence
- Use continuous learning algorithms to automate the process and significantly reduce time to insight
- Keep improving over time
- Allow for checks early in the process to avoid problems later
- Less error-prone and more repeatable
- Allows clients to drag and drop components on screen to configure enrichment; no knowledge of code is needed, and enrichment schemes can be reused with other data sets
- Intuitive visualisations of what is happening to the data
- Puts the power in the hands of data analysts instead of developers
- Reduces the number of people needed, and gets the right people more involved
Kantar: The Marketing Data Experts
Kantar has earned the trust of some of the world’s largest companies because of our experience and expertise with marketing data. We generate marketing data, enrich it (our data and data from others) in a no-code environment, and focus on helping organisations use this data to make smarter, faster decisions. Kantar’s specialists have a thorough understanding of the complexity and nuances of marketing data sets, and routinely handle every step of the data insights process, including data engineering with Olympus, our platform with built-in artificial intelligence and machine learning.