REST API Data Source Demo¶

The REST API data source ingests paginated JSON endpoints into Spark DataFrames with optional request throttling and retry support.

import json
from spark_fuse.spark import create_session
from spark_fuse.io import (
    REST_API_CONFIG_OPTION,
    REST_API_FORMAT,
    build_rest_api_config,
    register_rest_data_source,
)

spark = create_session(app_name="spark-fuse-rest-demo")
register_rest_data_source(spark)

payload = build_rest_api_config(
    spark,
    "https://pokeapi.co/api/v2/pokemon",
    source_config={
        "request_type": "GET",  # switch to "POST" when the API expects a payload
        "records_field": "results",
        "pagination": {"mode": "response", "field": "next", "max_pages": 2},
    },
)

pokemon = (
    spark.read.format(REST_API_FORMAT)
    .option(REST_API_CONFIG_OPTION, json.dumps(payload))
    .load()
)

pokemon.select("name").show(5)

Token pagination example (e.g., APIs that return paging.next.after):

payload = build_rest_api_config(
    spark,
    "https://api.hubapi.com/marketing/v3/marketing-events/",
    source_config={
        "records_field": "results",
        "params": {"limit": 100},
        "pagination": {"mode": "token", "param": "after", "field": "paging.next.after"},
    },
)

Highlights:

Cursor-based (response) and token-based (token) pagination.
Optional request headers and query parameters.
Issue GET or POST calls by setting request_type, attaching payloads with request_body.
Built-in retry/backoff controls.
Optional include_response_payload column to capture the full server JSON per row.

Notebook walkthrough¶

See the data source in action with additional configuration examples:

REST API Data Source Demo