Skip to content

Release v1.0.0 (2025-11-16)

Highlights

  • PySpark 4 (and delta-spark 4) is now the baseline for spark-fuse, including automatic Delta Scala classifier detection with an override via SPARK_FUSE_DELTA_SCALA_SUFFIX.
  • split_by_date_formats adopts try_to_timestamp where available, eliminating ANSI errors on invalid rows while still surfacing unmatched records through the existing modes.
  • Legacy catalog helpers (Unity/Hive), their CLI commands, and the experimental Qdrant connector have been removed to keep the 1.0.0 surface area focused on the actively maintained toolchain.
  • Installation and demo docs now reflect the PySpark 4 requirements and updated Java guidance.

Upgrade Notes

  • Ensure your environments (local notebooks, CI, clusters) install PySpark 4.x and delta-spark 4.x. Earlier versions are no longer supported and the package metadata enforces these minimums.
  • When customizing Delta Lake coordinates, set SPARK_FUSE_DELTA_SCALA_SUFFIX (for example, _2.12) to override the auto-detected Scala binary suffix.
  • If you depended on the removed catalog helpers or Qdrant stub, pin to spark-fuse==0.4.0 until you migrate those workflows to first-party alternatives.

Detailed Changes

Changed

  • Require PySpark 4.x (and delta-spark 4.x) in the package metadata and auto-detect the Scala binary when configuring Delta Lake jars, with an escape hatch via SPARK_FUSE_DELTA_SCALA_SUFFIX.

Fixed

  • split_by_date_formats now relies on try_to_timestamp when available so PySpark 4 no longer raises ANSI parsing errors for invalid rows; unmatched rows are still surfaced per the chosen mode.

Removed

  • Deprecated catalog helpers (Unity/Hive) and their CLI commands, documentation, and tests.
  • Dropped the experimental Qdrant connector stub and the optional qdrant dependency extra.

Documentation

  • Updated install/prerequisite docs and demos to reference PySpark 4 and current Java requirements.