freshness
version: 2
sources:
- name: <source_name>
freshness:
warn_after:
count: <positive_integer>
period: minute | hour | day
error_after:
count: <positive_integer>
period: minute | hour | day
filter: <boolean_sql_expression>
loaded_at_field: <column_name_or_expression>
tables:
- name: <table_name>
freshness:
warn_after:
count: <positive_integer>
period: minute | hour | day
error_after:
count: <positive_integer>
period: minute | hour | day
filter: <boolean_sql_expression>
loaded_at_field: <column_name_or_expression>
...
Definition
A freshness block is used to define the acceptable amount of time between the most recent record, and now, for a table to be considered "fresh".
In the freshness
block, one or both of warn_after
and error_after
can be provided. If neither is provided, then dbt will not calculate freshness snapshots for the tables in this source.
loaded_at_field
A column name (or expression) that returns a timestamp indicating freshness.
If using a date field, you may have to cast it to a timestamp:
loaded_at_field: "completed_date::timestamp"
Or, depending on your SQL variant:
loaded_at_field: "CAST(completed_date AS TIMESTAMP)"
If using a non-UTC timestamp, cast it to UTC first:
loaded_at_field: "convert_timezone('Australia/Sydney', 'UTC', created_at_local)"
count
(Required)
A positive integer for the number of periods where a data source is still considered "fresh".
period
(Required)
The time period used in the freshness calculation. One of minute
, hour
or day
filter
(optional)
Add a where clause to the query run by dbt source freshness
in order to limit data scanned.
This filter only applies to dbt's source freshness queries - it will not impact other uses of the source table.
This is particularly useful if:
- You are using BigQuery and your source tables are partitioned tables
- You are using Snowflake, Databricks, or Spark with large tables, and this results in a performance benefit
Examples
Complete example
version: 2
sources:
- name: jaffle_shop
database: raw
freshness: # default freshness
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _etl_loaded_at
tables:
- name: customers # this will use the freshness defined above
- name: orders
freshness: # make this a little more strict
warn_after: {count: 6, period: hour}
error_after: {count: 12, period: hour}
# Apply a where clause in the freshness query
filter: datediff('day', _etl_loaded_at, current_timestamp) < 2
- name: product_skus
freshness: # do not check freshness for this table
When running dbt source freshness
, the following query will be run:
- Compiled SQL
- Jinja SQL
select
max(_etl_loaded_at) as max_loaded_at,
convert_timezone('UTC', current_timestamp()) as snapshotted_at
from raw.jaffle_shop.orders
where datediff('day', _etl_loaded_at, current_timestamp) < 2
select
max({{ loaded_at_field }}) as max_loaded_at,
{{ current_timestamp() }} as snapshotted_at
from {{ source }}
{% if filter %}
where {{ filter }}
{% endif %}