Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased¶
0.2.1 - 2025-09-08¶
Added¶
- REST API connector with pagination, retry handling, and distributed ingestion support.
- Example notebook showcasing the REST connector against the public PokeAPI service.
- Test coverage for REST API pagination paths using mocked HTTP responses.
Changed¶
- Databricks connector can now read from Unity Catalog and Hive Metastore tables in addition to DBFS paths.
- Documentation highlights REST ingestion and updated Databricks capabilities.
- Bumped minimum recommended package version in install guides and README.
0.2.0 - 2025-09-07¶
Added¶
map_column_with_llm
runs as a scalar PySpark UDF with per-executor caching, retry-aware LLM calls, and support for leavingtemperature
unset when providers require their defaults.- Databricks notebook demo for the LLM mapping pipeline, including optional secret scope integration for credentials.
sitecustomize.py
to automatically expose the repositorysrc/
directory onsys.path
during local development.
Changed¶
- Documentation and examples now highlight the LLM mapper, including dry-run guidance and provider-specific temperature tips.
- Notebook samples default to
o4-mini
withtemperature=None
to align with current OpenAI requirements.
Removed¶
- Unused pandas dependency now that the mapper no longer relies on pandas UDFs.
0.1.9 - 2025-09-06¶
Added¶
- LLM-powered
map_column_with_llm
transformation for semantic column normalization with batch processing, caching, and retry-aware API integration.
Documentation¶
- Highlighted the LLM mapping helper in the README, site landing page, and API reference.
0.1.8 - 2025-09-06¶
Fixed¶
- Corrected
scd2_upsert
documentation parameters so MkDocs strict builds no longer fail.
Documentation¶
- Ensured SCD API reference renders cleanly by aligning docstring argument annotations with the function signature.
0.1.7 - 2025-09-06¶
Added¶
- Transformation utilities module offering column renaming, casting, literal injection, whitespace cleanup, and multi-format date parsing helpers.
- Test coverage for the transformation utilities, including
split_by_date_formats
error modes.
Documentation¶
- Added API reference page for transformation utilities and highlighted the new helpers in the feature overview.
0.1.6 - 2025-09-05¶
Changed¶
- Rename module
spark_fuse.utils.scd2
tospark_fuse.utils.scd
(update imports accordingly). - Databricks and Fabric connectors route Delta writes through SCD helpers (
apply_scd
) whenfmt='delta'
. - IO connectors: extract shared option parsers (
as_seq
,as_bool
) tospark_fuse.io.utils
to remove duplication.
Added¶
- Tests for SCD utilities: SCD1 de-dup/update, SCD2 versioning, and dispatcher.
Fixed¶
scd2_upsert
: accept SQL string forload_ts_expr
; compute next version using historical max per key.- Spark session helper: add Delta Lake package via version map to match local PySpark, avoiding classpath mismatches.
- Test fixture: bind Spark driver to localhost and disable UI to avoid port issues in CI/sandboxes.
0.1.5 - 2025-09-04¶
Added¶
- Documentation site: MkDocs + Material theme under
docs/
, with GitHub Pages workflow to build and deploy.
Changed¶
- Packaging metadata: set
Homepage
/Documentation
URLs to the GitHub Pages site.
0.1.4 - 2025-09-04¶
Changed¶
- CI/Release: switch to PyPI Trusted Publishing (OIDC) with attestations enabled.
- CI/Release: publish workflow runs only on GitHub Release "published" events.
0.1.3 - 2025-09-04¶
Changed¶
- CI/Release: publish via
pypi
environment usingPYPI_API_TOKEN
(API token auth) and disable attestations warnings.
0.1.2 - 2025-09-04¶
Changed¶
- CI/Release: trigger publish workflow on tag pushes matching
v*
. - CI/Release: validate built artifacts with
twine check
before upload.
0.1.1 - 2025-09-04¶
Fixed¶
- Packaging: fix Hatch src-layout config for editable installs (
pip install -e .[dev]
).
Added¶
- SCD2 Delta writer helper
scd2_upsert
inspark_fuse.utils.scd
. - Qdrant connector stub (
qdrant://
URIs) and optional extraqdrant
(qdrant-client>=1.8
). - Project docs: GOVERNANCE.md, CODE_OF_CONDUCT.md, CONTRIBUTING.md, SECURITY.md, INSTALL, MAINTAINERS, authors.md, roadmap.md.
Changed¶
- CI/Release: publish workflow targets protected
pypi
environment; switch to Trusted Publishing.
0.1.0 - 2024-09-01¶
Added¶
- Connectors: ADLS Gen2 (
abfss://
), Microsoft Fabric OneLake (onelake://
andabfss://…onelake…
), and Databricks DBFS (dbfs:/
). - Catalog helpers: Unity Catalog (create catalog/schema, register external Delta) and Hive Metastore (create database, register external Delta).
- Spark helper:
create_session
with Delta configuration and environment detection (Databricks/Fabric/local). - CLI (
spark-fuse
): connectors
— list available connectorsread
— load and preview datasetuc-create
,uc-register-table
— Unity Catalog opshive-register-external
— Hive external table registrationfabric-register
— register external Delta table backed by OneLakedatabricks-submit
— submit job via REST API- Tests: registry resolution, path validation (ADLS/Fabric), and SQL generation (UC/Hive).
- CI: Python 3.9–3.11 matrix, ruff + pytest. Pre-commit hooks and Makefile.