What Are Seeds in dbt?
Seeds in dbt are CSV files that you can load into your database as tables. They allow you to manage static data that doesn’t change often but is essential for your models, like lookup tables, reference data, or mapping tables. Seeds are stored in the data
directory of your dbt project.
Why Use Seeds?
- Version Control: Since seeds are just CSV files, they can be version-controlled like the rest of your dbt project.
- Simplicity: They provide an easy way to manage static data directly within your dbt project without needing to rely on external databases or sources.
- Reusability: Once seeded, these tables can be referenced in other dbt models, making them reusable across your project.
Example of Using Seeds in dbt
1. Create a Seed File
Let’s say you have a list of countries and their associated country codes that you want to use in multiple models.
Create a CSV file named country_codes.csv
and place it in the data
directory of your dbt project.
country_codes.csv
country_code,country_name
US,United States
CA,Canada
UK,United Kingdom
FR,France
DE,Germany
2. Seed the Data into the Database
To load this data into your database, you need to run the following command:
dbt seed
This command will read the CSV file and create a table in your database with the same name as the CSV file, in this case, country_codes
.
3. Use the Seed in a dbt Model
Now that the seed has been loaded into your database, you can use it like any other table in your dbt models.
For example, you could create a model that joins the country_codes
seed with your sales
table to enrich your sales data with country names.
models/enriched_sales.sql
WITH sales_data AS (
SELECT
s.sale_id,
s.amount,
s.sale_date,
c.country_name,
s.product_id,
s.customer_id
FROM {{ ref(‘sales’) }} s
LEFT JOIN {{ ref(‘country_codes’) }} c
ON s.country_code = c.country_code
)
SELECT * FROM sales_data;
4. Run the Model
To create the enriched_sales
model, run:
dbt run
This command will generate a new table or view (depending on your configuration) that includes the sales data enriched with country names.
Advanced Use Cases for Seeds
- Lookup Tables: Store mappings like state abbreviations to state names, product category codes to category names, etc.
- Feature Flags: Manage feature flags or configuration settings that control logic in your models.
- Testing Data: Provide controlled data sets for testing your models.
Updating Seeds
If the data in your seed file changes, simply update the CSV file and run dbt seed
again. This will reload the data into the database, overwriting the previous seed data.
Best Practices
- Limit the Size: Since seeds are meant for static data, keep them relatively small. Large datasets should be managed using other methods.
- Documentation: Document your seeds like you would with models so that their purpose and usage are clear.
- Version Control: Always commit your seed files to your version control system to track changes over time.
Summary
Seeds in dbt provide a convenient way to manage and version-control static data within your project. By using seeds, you can enrich your models with consistent, reliable data directly from CSV files, all within the same workflow as your other dbt models.
0 Comments