What is data ingestion?
As a marketer, you may have access to many types of data from hundreds of data providers. Data ingestion is the process of pulling all of this data into one place (usually on a recurring schedule) and making sure it is ready to be enriched, harmonised, and analysed.
The challenges of ingesting marketing data
Marketing data can be difficult to handle for a variety of reasons, including:
- Variety and volume – marketing data is generated at a high velocity and can involve high volumes of data (including everything from sales data to media spend data), plus multistep calculations, dependent questions, and survey data with weights
- Dynamic nature – this data often changes quickly, and has significant variations over time (for example, as you add new survey questions)
- Changing data formats – survey providers, syndicated data sources, and internal data assets often update formats
- Analyst preferences – your data analysts may want the data organised a specific way to aid analysis and hypotheses testing, which needs to be addressed during ingestion
- Depth – specifically in survey data, respondents can provide 20-plus minutes of inputs, which adds complexity from a data structure perspective
The four steps of data ingestion
1. Understanding what you need to ingest
How the data is ingested depends on certain factors, including:
- Format (flat files, database tables, JSON, etc.)
- Location (SFTP, cloud storage, database, source API, etc.)
- Type of data (sales, surveys, etc.)
- Data schema (structured, semi-structured, unstructured)
Once you know the format, location, type, and metrics, you can start to ingest the data. You should prioritise metrics used to make decisions or understand the business, rather than bringing in every metric.
2. Ingestion platform connects to data sources and ingests the data
Leading platforms are data agnostic and can ingest data in any format provided by the source. This is done through configurable “connectors” (also called “adapters”) to ingest the data. Some data ingestion platforms have hundreds of these ready-to-use connectors, depending on the type of data they ingest, how old the file formats are, and other factors.
3. All data goes into the Data Sink
Once the connectors are set up, ingesting the data typically only takes a few hours depending on the size of the data sets. The data is usually pulled into a cloud platform, depending on regulatory requirements and other factors, and is now ready for data enrichment. As new data sources are added (or existing sources change), you may need to update the ingestion process.
4. Automation, validation, and machine learning (to save time and streamline the process)
- Automation: runs ingestion at specified times, and provides alerts if files are not ingested
- Validation: performs basic data validation, including checking for syntax and eliminating blank files; validation is not a quality check, simply an opportunity to catch potential issues
- Machine learning: automatically identifies what type of data you’re ingesting, as well as some of the operations needed for enrichment
Kantar: The Marketing Data Experts
While other companies provide a basic technology solution, Kantar offers leading technology backed by decades of marketing expertise and human understanding. We have a unique understanding of marketing data because we generate marketing data, build connectors for it, and specialise in helping organisations use this data to make smarter, faster decisions. Kantar’s specialists have a thorough understanding of the complexity and nuances of marketing data sets, and routinely handle every step of the data insights process — including data engineering with Olympus, our platform with built-in artificial intelligence and machine learning.