Seeds in DBT – Bidatatech

What Are Seeds in dbt?

Seeds in dbt are CSV files that you can load into your database as tables. They allow you to manage static data that doesn’t change often but is essential for your models, like lookup tables, reference data, or mapping tables. Seeds are stored in the data directory of your dbt project.

Why Use Seeds?

Version Control: Since seeds are just CSV files, they can be version-controlled like the rest of your dbt project.
Simplicity: They provide an easy way to manage static data directly within your dbt project without needing to rely on external databases or sources.
Reusability: Once seeded, these tables can be referenced in other dbt models, making them reusable across your project.

Example of Using Seeds in dbt

1. Create a Seed File

Let’s say you have a list of countries and their associated country codes that you want to use in multiple models.

Create a CSV file named country_codes.csv and place it in the data directory of your dbt project.

country_codes.csv

country_code,country_name
US,United States
CA,Canada
UK,United Kingdom
FR,France
DE,Germany

2. Seed the Data into the Database

To load this data into your database, you need to run the following command:

dbt seed

This command will read the CSV file and create a table in your database with the same name as the CSV file, in this case, country_codes.

3. Use the Seed in a dbt Model

Now that the seed has been loaded into your database, you can use it like any other table in your dbt models.

For example, you could create a model that joins the country_codes seed with your sales table to enrich your sales data with country names.

models/enriched_sales.sql

WITH sales_data AS (
SELECT
s.sale_id,
s.amount,
s.sale_date,
c.country_name,
s.product_id,
s.customer_id
FROM {{ ref(‘sales’) }} s
LEFT JOIN {{ ref(‘country_codes’) }} c
ON s.country_code = c.country_code
)

SELECT * FROM sales_data;

4. Run the Model

To create the enriched_sales model, run:

dbt run

This command will generate a new table or view (depending on your configuration) that includes the sales data enriched with country names.

Advanced Use Cases for Seeds

Lookup Tables: Store mappings like state abbreviations to state names, product category codes to category names, etc.
Feature Flags: Manage feature flags or configuration settings that control logic in your models.
Testing Data: Provide controlled data sets for testing your models.

Updating Seeds

If the data in your seed file changes, simply update the CSV file and run dbt seed again. This will reload the data into the database, overwriting the previous seed data.

Best Practices

Limit the Size: Since seeds are meant for static data, keep them relatively small. Large datasets should be managed using other methods.
Documentation: Document your seeds like you would with models so that their purpose and usage are clear.
Version Control: Always commit your seed files to your version control system to track changes over time.

Summary

Seeds in dbt provide a convenient way to manage and version-control static data within your project. By using seeds, you can enrich your models with consistent, reliable data directly from CSV files, all within the same workflow as your other dbt models.

admin

0 Comments

Add Your Comment

Shopping cart

admin

Leave a Reply Cancel reply

Useful Links

Our Company

Get Contact