Ingestion Strategies

Understanding the Source and Lifecycle of Your Data

Introduction

Before choosing the right ingestion strategy, it’s crucial to understand two things: how data is extracted from its source and the lifecycle of the target entity.

How is Data Extracted?

Data can come in many forms, and understanding the nature of the delivery is key to selecting an effective ingestion strategy. Here are a few possibilities:

  • Snapshot: Imagine a snapshot like a photo—one instance of each item is captured. For example, a customer will appear only once in each data delivery.

  • Log: Here, we’re dealing with a history of changes. If a customer’s details change multiple times, they’ll show up multiple times in the log.

  • Full Delivery: Every instance, whether it’s new or unchanged, is sent in each delivery. Think of it like receiving all your books each time you place an order—whether you’ve read them or not.

  • Partial Delivery: Only instances that have changed since the last delivery are sent. Like receiving an update on only the books you’ve just ordered, rather than everything.

What is the Life Cycle of the data?

Understanding whether the data is transactional or time-variant also impacts your ingestion strategy:

  • Transactional Data: This data is like a one-off event, such as an ATM transaction, where the data appears only once and never changes.

  • Time-Variant Data: Think of this as a living entity. A customer, for example, might update their address or change their income over time. You’ll want to track these changes throughout its lifecycle.

Ingestion Strategies Explained

Full Ingestion Strategy

Prerequisite

In a full ingestion strategy, the source must provide a complete snapshot of the data, with one row per instance. That means the source data contains every instance, whether it’s changed or not.

Usage

This strategy works best for time-variant entities where tracking changes over time is essential. You’ll want to capture a customer’s history—when their name, address, or income changed—so you can have a timeline of events.

Example: Customer Entity

For a Customer Entity, this allows you to see how a customer’s profile evolves—what their name was last year, or where they lived a few months ago.

Total Ingestion Strategy

Prerequisite

In a total ingestion strategy, the source includes a history of all changes for each instance. So, the delivery might contain multiple rows for a single customer—one row for each change. However, to make sense of this data, you need a stable business or technical timestamp that indicates when each row was accurate.

Usage

This strategy is for time-variant entities, but with a twist. You also rely on Effective Timestamp to track when the datat was valid, which is crucial for managing the history over time. The Effective Timestamp should be mapped to a business/technical date/timestamp that is stable in each data delivery. So if a row is sent again, it has a stable date/timestamp (the row of data has the same date/timestamp) that is used to drive the history in Daana.

Example: Customer Entity

For the Customer Entity, you can track how a customer’s attributes (like name or address) have changed and pinpoint exactly when each change occurred.

Incremental Ingestion Strategy

Prerequisite

With an incremental ingestion strategy, the source delivers only a partial set of data—only instances that have changed since the last delivery are sent. However, if something has been deleted from the source, it must be included in the delivery, so Daana can perform a soft delete, marking it as no longer active in the data warehouse.

Usage

This strategy is used for time-variant entities when you want to track changes without reloading everything. It’s a leaner approach that saves processing time by only ingesting what’s changed.

Example: Customer Entity

For the Customer Entity, if a customer’s address changed, you would only receive an update for that particular instance, allowing you to track the change without having to process all customer records.

Transactional Ingestion Strategy

Prerequisite

In the transactional ingestion strategy, each instance of data is delivered only once, with no future changes. Transactions are one-time events that don’t need to be updated later.

Usage

This strategy works best for transactional entities, where you’re capturing events like ATM withdrawals or purchases. You care about recording the event, and that’s it—no need to follow up with changes.

Example: ATM transactions

For ATM Transactions, this strategy would capture every transaction a machine processes, ensuring each one is recorded without the need for future updates.

By choosing the right ingestion strategy, you ensure that Daana handles your data effectively, whether it’s transactional or time-variant, full or incremental. The key is understanding both your data’s lifecycle and how it’s delivered.

Last updated

Was this helpful?