dbt engineers build and maintain the SQL transformation layer that turns raw data warehouse tables into clean, documented, tested analytical models — writing modular SELECT-only SQL files that dbt compiles into production DDL and DML, defining schema tests that enforce data quality before models are promoted downstream, and organizing models into staging, intermediate, and mart layers that make the entire data transformation graph navigable and auditable. At remote-first technology companies, they serve as the analytics engineers who own the semantic layer between raw ingested data and the BI tools, dashboards, and analytical queries that business users and data scientists depend on — applying software engineering disciplines (version control, code review, CI/CD, documentation) to the SQL transformations that were previously fragile, untested scripts maintained in spreadsheets or ad-hoc query tools.
What dbt engineers do
dbt engineers write models — creating .sql files in the models/ directory containing SELECT statements that dbt compiles and executes as CREATE TABLE AS SELECT or CREATE VIEW AS SELECT in the target warehouse, with model names becoming the table or view names; configure materializations — setting {{ config(materialized='table') }} for models where full refresh is acceptable, 'view' for lightweight non-persisted models, 'incremental' for append-or-upsert patterns that process only new rows, and 'ephemeral' for reusable CTEs that exist only in-query; implement incremental models — using {% if is_incremental() %} WHERE created_at > (SELECT MAX(created_at) FROM {{ this }}) {% endif %} to restrict incremental runs to new data, and unique_key='id' with incremental_strategy='merge' for upsert behavior on Snowflake and BigQuery; reference models — using {{ ref('stg_orders') }} to reference another model by name, which dbt uses to build the DAG and compile the correct fully-qualified table reference for the target environment; source raw tables — using {{ source('raw', 'orders') }} to reference raw source tables defined in _sources.yml files that declare source schemas and enable source freshness checks; write schema tests — defining name: id with tests: [unique, not_null] in schema.yml files for built-in generic tests, and using dbt-utils or custom tests for accepted_values, relationships (foreign key integrity), and expression_is_true checks; write data quality tests — using dbt-expectations for expect_column_values_to_be_between, expect_table_row_count_to_be_between, and distribution-based tests that go beyond the built-in four tests; document models — writing description: "One row per order, with order status cleaned and payment method normalized" in schema.yml for model-level documentation, column-level descriptions, and doc() blocks for reusable documentation chunks rendered in dbt docs; generate and serve documentation — running dbt docs generate and dbt docs serve for the interactive lineage graph and data dictionary that shows every model's dependencies, column documentation, and test results; configure environments — using profiles.yml with target environments (dev, staging, prod) that compile models to different schemas, enabling developer-specific schemas (dbt_alice_orders) that prevent development runs from polluting production tables; implement macros — writing Jinja2 {% macro %} functions for reusable SQL logic (date spine generation, common fiscal calendar calculations) that execute in any model that calls them; use dbt packages — installing dbt-utils, dbt-expectations, dbt-date, and dbt-audit-helper from packages.yml for pre-built macros and test libraries; and run dbt in CI — executing dbt build --select state:modified+ in GitHub Actions to build and test only models changed in the PR and their downstream dependencies.
Key skills for dbt engineers
- Models: SQL SELECT files; ref(); source(); materializations (table/view/incremental/ephemeral)
- Incremental: is_incremental(); unique_key; incremental_strategy (merge/append/insert_overwrite)
- Testing: schema.yml tests; unique; not_null; accepted_values; relationships; dbt-expectations
- Sources: _sources.yml; source(); source freshness; source_status; external sources
- Documentation: schema.yml descriptions; doc() blocks; dbt docs generate; column-level docs
- Macros: Jinja2; {% macro %}; dbt_utils; generate_surrogate_key; date_spine; star()
- Packages: packages.yml; hub.getdbt.com; dbt-utils; dbt-expectations; dbt-date; codegen
- Configuration: dbt_project.yml; config block; +materialized; +schema; +tags; selectors
- CI/CD: state:modified+; slim CI; dbt build; deferred resolution; environment variables
- Deployment: dbt Cloud; Airflow + dbt; Dagster + dbt; dbt Core; jobs; schedules
Salary expectations for remote dbt engineers
Remote dbt engineers earn $100,000–$165,000 total compensation. Base salaries range from $84,000–$135,000, with equity at technology companies where analytical data model reliability, transformation pipeline performance, and the clarity of the semantic layer that data scientists and BI developers depend on directly determine the speed and quality of data-driven decision making across the organization. dbt engineers with incremental model design for billion-row fact tables using warehouse-native merge strategies, custom macro library development for domain-specific SQL patterns reused across dozens of models, dbt Cloud environment management for multi-team data organizations with hundreds of models, and demonstrated test coverage improvements where critical data quality checks prevented incorrect metrics from reaching production dashboards command the strongest premiums. Those with dbt combined with deep Snowflake, BigQuery, or Databricks warehouse expertise earn toward the top of the range.
Career progression for dbt engineers
The path from dbt engineer leads to senior analytics engineer (broader scope across the full analytics stack from source system integration through BI layer), data platform engineer (owning the dbt project architecture, testing standards, and CI/CD pipeline for a large data team), or data product manager (applying their understanding of the data model's business semantics to data product strategy). Some dbt engineers specialize into semantic layer architecture, extending dbt's metrics layer or MetricFlow to define business metrics — revenue, retention, conversion — as reusable metric definitions that any BI tool can query consistently. Others transition into data governance, using dbt's lineage graph and documentation layer as the foundation for data catalog, data ownership, and impact analysis programs. dbt engineers who contribute to the dbt ecosystem — writing packages on the dbt Hub, contributing to dbt Core's open-source codebase, or publishing the dbt best practices that become the community standard — become recognized data engineering thought leaders.
Remote work considerations for dbt engineers
Building dbt projects for distributed data engineering and analytics teams requires model naming conventions, materialization decision criteria, and test coverage standards that prevent distributed engineers from creating view models that downstream BI tools query directly at runtime (breaking performance SLAs when the view query changes), writing models without any test coverage that fail silently when upstream sources change, or building a flat model namespace with hundreds of undifferentiated models where ownership and layer responsibility are unclear. dbt engineers at remote companies establish the three-layer model architecture — staging models (1:1 with source tables, light cleaning and renaming only), intermediate models (business-logic joins and transformations), and mart models (final analytics-ready entities with business metric definitions) — and document the rule that BI tools may only reference mart models — because distributed engineers who build flat model collections where reporting queries join staging models create brittle pipelines where source schema changes break live dashboards directly; enforce materialization standards — documenting that staging models should be views (cheap, always fresh), intermediate models should be tables if queried by multiple downstream models, and mart models should be tables for query performance — because distributed engineers who materialize all models as views create unbounded runtime costs when BI tools hit complex chains of view-on-view query compilation; establish the test minimum — every model must have unique and not_null tests on its primary key, and every model joined to another must have a relationships test verifying FK integrity — because distributed engineers who build models without FK tests create silent fan-out duplicates when upstream models introduce duplicates that the join amplifies; and configure developer schema isolation — using dbt_project.yml schema configuration so each developer's dbt run materializes into dbt_<username>_<schema> — because distributed engineers running dbt run without schema isolation overwrite shared staging or production tables in development.
Top industries hiring remote dbt engineers
- Analytics engineering and business intelligence organizations where dbt is the standard SQL transformation framework for the modern data stack — companies using Snowflake, BigQuery, or Databricks hire dbt engineers as the analytics engineering discipline that separates raw data from analysis-ready data
- E-commerce and SaaS companies with Fivetran or Airbyte ingestion pipelines where dbt transforms raw connector data (Stripe, Salesforce, Shopify) into consistent dimensional models for revenue analytics, customer lifetime value, and funnel reporting
- Fintech and financial services data teams where dbt models define the regulatory reporting datasets, daily reconciliation outputs, and risk metric computations that require complete lineage documentation and test coverage for audit purposes
- Healthcare and life sciences organizations where dbt's documented lineage enables HIPAA-compliant data transformation documentation — every table's row-level provenance is traceable through the dbt lineage graph to its source system
- Product analytics companies where dbt models define user behavior event funnels, cohort retention tables, and A/B test result aggregations that serve as the foundation for product decision data consumed by experimentation platforms and dashboards
Interview preparation for dbt engineer roles
Expect model writing questions: write a staging model for an orders source table that renames order_id → order_key, created_at → order_created_at, and casts amount to float — what the SQL SELECT, {{ source() }} reference, and config block look like. Materialization questions ask when you'd choose incremental over table materialization and what the is_incremental() filter pattern looks like for a daily events table — the trade-offs between full refresh cost and incremental logic complexity. Test questions ask how you'd add tests ensuring an order's user_id always references an existing user in the users model — what the relationships test in schema.yml looks like. Ref chain questions ask how dbt knows to build model A before model B — what {{ ref('model_a') }} does to the DAG and how dbt determines build order. Documentation questions ask how you'd document all columns in a model and how you'd reuse a long description across multiple models that share a concept — what schema.yml column descriptions and doc() blocks look like. CI questions ask how you'd configure dbt in CI to only build and test models that changed in the PR — what state:modified+ and --defer flags do. Incremental strategy questions ask how you'd handle late-arriving data in an incremental model where some records arrive 3 days after their event date — what the lookback_window pattern and full_refresh flag address. Be ready to compare dbt Core with dbt Cloud and when each is appropriate.
Tools and technologies for dbt engineers
Core: dbt Core; dbt Cloud; dbt CLI; profiles.yml; dbt_project.yml. Models: .sql files; ref(); source(); this; config(); materializations; CTE patterns; staging/intermediate/mart layers. Configuration: dbt_project.yml; config block; +materialized; +schema; +tags; +enabled; selector.yml. Testing: schema.yml tests; unique; not_null; accepted_values; relationships; generic tests; singular tests. Packages: packages.yml; hub.getdbt.com; dbt-utils; dbt-expectations; dbt-date; dbt-audit-helper; codegen; elementary. Macros: Jinja2; {% macro %}; {% if %}; {% for %}; {% set %}; adapter.dispatch; run_query. Sources: _sources.yml; source(); source freshness; loaded_at_field; warn_after; error_after. Documentation: schema.yml descriptions; doc() blocks; dbt docs generate; dbt docs serve; lineage graph. Snapshots: dbt snapshot; strategy (timestamp/check); updated_at; unique_key; SCD Type 2. Seeds: dbt seed; CSV files; column_types; quote_columns. CI/CD: state:modified+; --defer; --state; slim CI; dbt build; GitHub Actions; CircleCI. Metrics: MetricFlow; dbt metrics; semantic layer; measure; dimension; entity. Cloud warehouses: Snowflake; BigQuery; Databricks; Redshift; DuckDB; Postgres. Orchestration: dbt Cloud jobs; Airflow + dbt-airflow; Dagster + dagster-dbt; Prefect + dbt. Alternatives: SQLMesh (dbt-compatible, Git-native); SQLGlot (transpilation); Coalesce (cloud dbt UI); Transform (metrics layer); Lightdash (dbt-native BI).
Global remote opportunities for dbt engineers
dbt engineer expertise is in strong and sustained global demand, with dbt's position as the dominant SQL transformation framework in the modern data stack — with over 30,000 companies using dbt (as reported by dbt Labs), 1 million dbt Cloud users, and inclusion as the de facto standard analytics engineering tool alongside Fivetran, Snowflake, and Looker in the canonical modern data stack architecture — creating consistent demand for engineers who understand both dbt's model-test-document workflow and the cloud data warehouse semantics that dbt targets. US-based dbt engineers are in demand at data-driven SaaS companies building analytics platforms, e-commerce organizations with multi-source data consolidation requirements, and financial services firms standardizing on the modern data stack for regulatory reporting and business intelligence. EMEA-based dbt engineers are well-positioned given the European data engineering community's strong dbt adoption — dbt has significant user communities in the UK, Germany, Netherlands, and France, and European data conferences regularly feature dbt-centered analytics engineering talks. dbt's continued development — MetricFlow for a standardized semantic layer, dbt Cloud Enterprise features, and the growing ecosystem of dbt-native BI tools like Lightdash — ensures sustained demand as analytics engineering becomes an established discipline alongside data engineering.
Frequently asked questions
How does dbt's incremental materialization work and what are the different incremental strategies for Snowflake and BigQuery? An incremental model materializes as a full table on its first run (dbt run) and on subsequent runs inserts or merges only new or updated rows, identified by the is_incremental() condition in the model's WHERE clause. The unique_key configuration determines how dbt handles rows that already exist: without unique_key, dbt only appends new rows; with unique_key='order_id', dbt performs an upsert (update existing, insert new). Incremental strategies by warehouse: Snowflake supports merge (upsert using unique_key), append (insert-only), and delete+insert (delete matching rows then insert); BigQuery supports merge, insert_overwrite (replaces affected partitions), and append. Late-arriving data: if records can arrive 3 days late, the is_incremental() filter should look back 3 days: WHERE created_at >= (SELECT DATEADD(day, -3, MAX(created_at)) FROM {{ this }}) — accepting some reprocessing overhead to capture late arrivals. Full refresh: dbt run --full-refresh --select my_model drops and recreates the table from scratch — needed when model logic changes affect historical records that incremental logic wouldn't reprocess. On-schema-change: on_schema_change='sync_all_columns' causes dbt to add new columns to the existing incremental table when the model's SELECT adds new columns, rather than failing on schema mismatch.
How should dbt models be organized into layers and what does each layer contain? The three-layer architecture is the industry standard: Staging (stg_), Intermediate (int_), and Marts (fct_/dim_). Staging layer (models/staging/): one model per source table with 1:1 column mapping — rename columns to consistent conventions (order_id → order_key), cast types appropriately, add simple derived columns (parse timestamps), but perform no business logic joins. Each staging model directly references one {{ source() }}. These materialize as views. Intermediate layer (models/intermediate/): business logic joins and complex transformations — join orders to customers, apply business rules, compute metrics that require multiple source tables. Referenced only by other dbt models, never by BI tools. Materialize as tables when queried by multiple downstream models. Marts layer (models/marts/): analytics-ready entities organized by business domain (finance/, product/, marketing/). Fact tables (fct_orders, fct_events) and dimension tables (dim_customers, dim_products) that BI tools, dashboards, and data scientists query directly. Always materialized as tables. The rule: BI tools only reference marts/ models; staging/ models are internal dbt plumbing. Why it matters: when a source schema changes, only the affected staging model needs updating — the intermediate and mart models receive clean, consistently-named inputs without being affected by upstream column renames.
What are dbt packages and which ones are most useful for a production dbt project? dbt packages are reusable collections of models, macros, tests, and seeds installed from packages.yml and fetched with dbt deps. They extend dbt's functionality without writing from scratch. dbt-utils (most widely used): provides generate_surrogate_key() for consistent hash-based surrogate keys across warehouses, star() macro to select all columns except a list, date_spine() for generating a complete date sequence, union_relations() for dynamically unioning many tables, and get_column_values() for dynamically generating SQL from metadata. dbt-expectations: port of Great Expectations assertions as dbt tests — expect_column_values_to_be_between, expect_table_row_count_to_equal_other_table, expect_column_proportion_of_unique_values_to_be_between — for data quality tests beyond dbt's built-in four. dbt-date: date manipulation macros for fiscal calendars, date truncation, and date formatting that produce warehouse-agnostic SQL. codegen: generates schema.yml stubs from existing database tables — paste the warehouse table name and get a starting schema.yml that you fill in with tests and descriptions. elementary: adds observability on top of dbt — anomaly detection, data freshness monitoring, and a dashboard showing test results and model run history. Install: packages.yml with - package: dbt-labs/dbt_utils and version: [">=1.0.0"]; run dbt deps to install.