Dataset Structure for Template Code Skills
This document describes the general structure of a dataset setup for Template Code Skills and the underlying SQL table to start using them out of the box.
Data Assumptions
Dataset Properties
- Query row limit: LIMIT added to each SQL query. A null query row limit will default to a 100 row limit
- Dataset start / end date: not automatically generated. Required to hit outside of time range guardrails. If not set, the analysis will hit a no data guardrail in these cases instead
- Data Interval: Lowest level time granularity
- Source: A dataset can only be created from a single table or view Metrics
- Assumes metrics are their own columns in the table.
- Assumed to be a numeric type
- Assumed to have an aggregation function
- Can include null values, but will show up as NaN in the analysis
Dimensions
- Generally a string type in the DB and generally categorical variables.
- Assumes dimensions are their own columns in the table.
- Can include Null values, will show up as None in the analysis
Date Dimensions
- Can't include null values.
- Assumed to be discrete values, not time ranges.
- Assumes date dimensions are in YYYY-MM-DD format
- Requires columns to be in the following format, max\time{time\granularity}, up from the lowest time granularity available (set by the Data Interval).
- Supported time granularities
- date, week, month, quarter, year
- For example, if the dataset is daily data and you'd like to aggregate across weeks, months, quarters, and years, you'd have the following columns:
- max_time_date, max_time_week, max_time_month, max_time_quarter, max_time_year
- Supported time granularities
- Dimension Properties
- Should be hidden dimensions so the LLM doesn't use them explicitly during interpretation.
- Requires Data Type set to date
- Requires the SQL and SQL Sort Expression set
- SQL is used to show the output values of the date at that granularity, like W15 2024, Jan 2025, Q1 2023, 2021
- SQL Sort Expression is used to order the dates in ascending order.
max\time_date (Assumed to be a date type)
- SQL: <your_date_field>
- SQL Sort Expression: <your_date_field>
max\time_week
- SQL: 'W' || EXTRACT('week' FROM <your_date_field>) || ' ' || EXTRACT('year' FROM <your_date_field>)
- SQL Sort Expression: DATE_TRUNC('week', CAST(<your_date_field> AS DATE))
max\time_month
- SQL: LPAD(EXTRACT('month' FROM <your_date_field>)::VARCHAR, 2, '0') || '/' || EXTRACT('year' FROM <your_date_field>)
- SQL Sort Expression: DATE_TRUNC('month', CAST(<your_date_field> AS DATE))
max\time_quarter
- SQL: 'Q' || EXTRACT('quarter' FROM <your_date_field>) || ' ' || EXTRACT('year' FROM <your_date_field>)
- SQL Sort Expression: DATE_TRUNC('quarter', CAST(<your_date_field> AS DATE))
max\time_year
- SQL: EXTRACT('year' FROM <your_date_field>)
- SQL Sort Expression: EXTRACT('year' FROM <your_date_field>)
Updated 4 days ago