Skip to content

Release v1.0.1 (2025-11-18)

Highlights

  • scd2_upsert now replays duplicate keys within a single batch in chronological order so every change is stored as its own historical row.
  • The SCD demo notebook includes a "multiple updates per batch" walk-through and schema-evolution examples for both SCD1 and SCD2 to visualize the resulting Delta output.
  • Optional schema evolution (allow_schema_evolution=True) lets Delta automatically add columns when the upstream dataset introduces new attributes across both SCD helpers.

Upgrade Notes

  • No action is required for existing pipelines. The helper automatically sequences inputs that have not been pre-deduplicated and will continue to behave as before when one row per key is provided.

Detailed Changes

Added

  • Regression tests that feed multiple rows for a single key through scd2_upsert to validate that every version is created.
  • allow_schema_evolution flag on scd1_upsert and scd2_upsert to turn on Delta schema evolution during merges/appends.

Changed

  • scd2_upsert partitions each batch by business key and load order so earlier versions are closed before inserting later ones, retaining complete history at write time.

Documentation

  • Noted the new intra-batch sequencing behavior in the SCD guide and notebook, including a dedicated example that shows the resulting timeline of versions.