The hidden cost of full Salesforce extracts

If you’ve ever tried to extract and keep Salesforce data fresh in an analytics platform, you already know the challenge. Salesforce data changes constantly, business teams expect frequently updated dashboards, and engineers are stuck paying the "full extract tax": querying and extracting large volumes of unchanged data.

Full data extracts are expensive, slow and unnecessary, especially when only a small fraction of records have changed.

To address this, we usually turn to Change Data Capture (CDC). In practice, most analytics teams simply want to ingest what has changed since the last run using a cursor or watermark.

This is exactly where Databricks Lakeflow Connect stands out.

Lakeflow Connect provides "CDC out-of-the-box," purpose-built for the Lakehouse. It’s not just a basic copy-and-paste pipeline; incremental ingestion is a first-class feature. It automatically tracks data changes using Salesforce cursor/watermark columns, selecting the best available option and enabling scheduled, incremental ingestion optimized for analytics freshness.

Lakeflow Connect effectively acts as a CDC “cheat code” by automatically enabling Delta Change Data Feed (CDF) on all target tables. This is the difference between ‘we landed Salesforce data’ and ‘we can now build incremental Silver/Gold tables without full rescans’.

With CDF, downstream teams can consume row-level inserts, updates, and deletes without rescanning entire tables. Soft deletes are captured in subsequent syncs, while hard deletes—and rare recycle bin purge timing edge cases—may require a full refresh. For notoriously difficult formula fields (which don't trigger cursor updates), Lakeflow defaults to snapshotting or offers a beta incremental option to prevent silent data drift.

This is where Lakeflow moves beyond “just another connector.” Its managed ingestion pipelines are powered by Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables), giving engineers real operational visibility and control, not just data movement.

  • A practical, visual pipeline UI that shows runs, DAGs, and dataset-level status directly in Databricks
  • Built-in data quality expectations with metrics surfaced in the UI, so quality isn’t an afterthought
  • A dedicated, integrated data quality experience that makes it easier to define, monitor, and enforce thresholds without external tooling

Top 5 reasons stakeholders chose Lakeflow Connect

When evaluating Salesforce ingestion options, stakeholders consistently lean toward Lakeflow Connect for five key reasons:

  1. Zero-code implementation
    Ingest Salesforce data without writing code, bypassing the need to build and maintain complex API integrations.
  2. Out-of-the-box incremental updates
    Cursor-based incremental ingestion is the default, removing the need to design and manage incremental logic manually.
  3. Reduced operational blast radius
    Serverless execution eliminates the need to manage, provision, or troubleshoot ingestion clusters.
  4. End-to-end incremental architecture
    With Delta CDF enabled automatically, downstream Silver and Gold transformations become incremental by default.
  5. Native Unity Catalog governance
    Connections and pipelines are fully managed by Unity Catalog, enabling instant access control, auditing, and reuse across pipelines.

Why general ETL tools fall behind

Most traditional Extract Transform and Load (ETL) tools are effective at extracting Salesforce, but they typically stop at “copy.” The burden of implementing CDC still falls on engineering teams.

This includes:

  • Maintaining watermarks and state tracking
  • Building replay and recovery logic
  • Implementing MERGE/upsert/delete handling

Additionally, edge cases—such as formula fields and delete timing—can result in pipelines that appear successful while data quietly drifts out of sync.

The verdict

If your goal is to keep Salesforce data fresh in the Lakehouse without building CDC code and logic from scratch, Databricks Lakeflow Connect is the most efficient path.

It doesn’t just move data. It elevates “change” to a first-class output. Incremental loads land in Delta tables already optimized for downstream incremental processing via Delta Change Data Feed, with governance and access control handled natively through Unity Catalog.

Lakeflow Connect is the clear choice when Databricks is your destination and you want the shortest path from Salesforce changes to reliable, incremental Lakehouse tables.

Other ETL tools can move the data. Lakeflow Connect makes it scalable, fast, governable, and incremental by design.