Skip to content

spark-fuse — Install Guide

Prerequisites - Python: 3.9+ - Java: JDK 17 (recommended; PySpark 4.x) - OS packages: build tools to compile native deps if needed

Virtual Environment (recommended) - macOS/Linux: - python3 -m venv .venv - source .venv/bin/activate - python -m pip install --upgrade pip - Windows (PowerShell): - python -m venv .venv - .\\.venv\\Scripts\\Activate.ps1 - python -m pip install --upgrade pip

Quick Install (PyPI) - pip install "spark-fuse>=1.0.2"

Development Install (editable) - python -m pip install --upgrade pip - pip install -e ".[dev]"

Verify Installation - python -c "import spark_fuse; print(spark_fuse.__version__)" - spark-fuse --help

Java Setup Notes - macOS (Homebrew): brew install openjdk@17 then add to your shell: - export JAVA_HOME="$(/usr/libexec/java_home -v 17)" - Linux: install OpenJDK 17 (or 11 at minimum) using your distro’s package manager and set JAVA_HOME accordingly.

Optional: Authentication and Environment - REST APIs: set headers / request_kwargs in the REST data source options (see build_rest_api_config) for API keys, OAuth tokens, or proxies. Use environment variables to avoid committing secrets. - SPARQL endpoints: many public services require a descriptive User-Agent header—pass it via the data source config.

Minimal Usage Example

import json
from spark_fuse.spark import create_session
from spark_fuse.io import (
    REST_API_CONFIG_OPTION,
    REST_API_FORMAT,
    build_rest_api_config,
    register_rest_data_source,
)

spark = create_session(app_name="spark-fuse-demo")
register_rest_data_source(spark)
config = build_rest_api_config(
    spark,
    "https://pokeapi.co/api/v2/pokemon",
    source_config={"records_field": "results", "pagination": {"mode": "response", "field": "next"}},
)
df = (
    spark.read.format(REST_API_FORMAT)
    .option(REST_API_CONFIG_OPTION, json.dumps(config))
    .load()
)
df.select("name").show(5)

Testing and Linting - Run tests: pytest - Lint: ruff check src tests - Format: ruff format src tests

Publishing (Maintainers) - Bump version in pyproject.toml:1 - Create GitHub Release (tag vX.Y.Z) - Workflow .github/workflows/publish.yml:1 builds and uploads to PyPI using the protected pypi environment (PYPI_API_TOKEN).