aistackregistry.com — notes/readme

data notes

README

Source: README.md.


Content (verbatim markdown)

sha256: c809355ab293dca3b0ee8938e1dbf9df82518c669cb1efbed5afbb0767ff3472

# AI Stack Registry (aistackregistry)

A public, multi-tenant registry for dependency compatibility and AI model defaults. Dated snapshots with checksums and LLM-first artifacts.

## Status
- This repository is private for now (next few weeks) while changes continue.
- The current public contract version is `0.2.0`.

## Rationale
LLM training cutoffs make “latest version” and “current model defaults” stale. Frontier providers and package ecosystems change frequently (new model IDs, updated limits, renamed SDKs, shifting dependency constraints). This registry publishes dated, verifiable snapshots so builders and agents can fetch current data with checksums and embedded source citations.

## What this provides
- **Curated, multi-tenant stacks** with explicit priority tiers for compatibility resolution.
- **Latest compatible sets** under the configured Python baseline (`policy/registry.yaml` → `python_version`), with a transparent blocking report that explains why pins exist.
- **Model registry artifacts** driven by multi-provider policies (currently Gemini, Anthropic, OpenAI, and xAI), with defaults sourced from docs and token limits/modalities from provider APIs.
- **Canonical provider paths** for public model artifacts (for example, `google` policies publish under `/models/google/`).
- **LLM-first outputs**: JSON artifacts + `llms.txt` + a concise landing page.
- **Signed provenance** with checksums (and cosign signatures when available).

## Assumptions
- Linux, x86_64 marker environment is used when evaluating `requires_dist` markers for the configured Python baseline (`policy/registry.yaml` → `python_version`).
- `uv` is available in CI for dependency resolution (`uv pip compile`).
- `cosign` is available in CI for keyless (OIDC) signing; key-based signing happens only when `COSIGN_KEY` is explicitly set.
- Local runs can skip cosign and emit checksums only unless `COSIGN_KEY` is explicitly set.
- Provider API keys (for example, `GEMINI_API_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, and `XAI_API_KEY`) are available in CI for fetching model metadata.
- The default stack is `google-ai-agents`.
- Network access is required to fetch authoritative sources (PyPI, model APIs/docs, raw GitHub overlays).
- Repo overlays are available via raw GitHub URLs; CI can export `GITHUB_OVERLAY_TOKEN` from a configured GitHub App for raw GitHub overlay fetches, and local runs can set `GITHUB_OVERLAY_TOKEN` explicitly.
- Local secrets are stored in `data/secrets/.env` (gitignored) and exported before running scripts.
- Snapshot-level `as_of` timestamps are provided in ISO 8601 UTC (e.g., `2025-01-01T00:00:00+00:00`).

## Authoritative sources
All outputs are derived from authoritative sources only, with citations embedded in artifacts and docs:
- PyPI JSON API (e.g., https://pypi.org/pypi/google-adk/json)
- pip constraints files semantics: https://pip.pypa.io/en/stable/user_guide/#constraints-files
- uv resolver (pip compile): https://docs.astral.sh/uv/pip/compile/
- Python release index (baseline drift source): https://www.python.org/downloads/
- Model API endpoints and docs listed in `policy/models/*.yaml` (for example, Gemini API models list/get: https://generativelanguage.googleapis.com/v1beta/models).
- Provider model docs (for example, Gemini model cards, thinking defaults, media resolution).
- SDK release sources listed in `policy/models/*.yaml` (for example, python-genai releases).
- llms.txt spec: https://llmstxt.org/
- Sigstore/cosign docs: https://docs.sigstore.dev/

## Repo structure
- `policy/`: curated stacks, repos, models, and registry configuration
- `scripts/`: fetch, resolve, build, sign, publish
- `schemas/`: JSON schemas for artifacts
- `site/`: landing page source (static HTML/CSS)
- `templates/`: source templates for rendered repo-facing docs
- `public/`: deterministic static output for the Cloudflare Pages direct-upload deploy
- `examples/`: example repo overlay file
- `tests/`: unit tests

Retained snapshots and retained public doc provenance are stored in Cloudflare R2 under the explicit `retained_state` contract in `policy/registry.yaml`, including manifest-backed bundle artifacts for deterministic restore.
Cloudflare Pages is the active production hosting contract. `main` uploads `public/` to the production Pages project `aistackregistry`, and GitHub Actions uses the dispatch-only `cloudflare-pages-production-cutover.yml` workflow for read-only verification of the production custom-domain contract. Any one-time bootstrap or repair of Pages domains or DNS remains a local manual step.
Non-main validation uploads `public/` to the staging Pages project `aistackregistry-staging` on the exact branch name, which keeps preview deploys out of the production Pages project and out of the production custom-domain path.

## How it works
### Dependency compatibility
- Stacks define **priority tiers** (highest to lowest).
- Curated package entries accept PEP 508 requirement strings (e.g., `google-adk[a2a]`); pins and reports are keyed by the project name.
- For each tier, `uv pip compile` resolves the latest compatible versions under the configured Python baseline (`policy/registry.yaml` → `python_version`), while **pinning higher tiers** to previously resolved versions.
- The final output includes `constraints.txt`, `constraints.json`, and `compat_report.json`.
- `compat_report.json` explains why a package is not latest (e.g., `google-adk` pinning `fastapi<0.124`) using pinned-version `requires_dist` from PyPI.

### Python baseline drift guard
- `scripts/check_python_version_drift.py` is a fail-fast guard that enforces baseline consistency across `policy/registry.yaml`, workflow runtime pins, and stack policy pins.
- The guard fetches `https://www.python.org/downloads/` (configured as `source_urls.python_releases`) and compares the latest `3.14.x` patch against the pinned registry baseline.
- Workflow: `.github/workflows/python-version-drift.yml` runs on a daily schedule and `workflow_dispatch`.
- On drift or pin mismatch, the workflow fails with an explicit error; no fallback behavior is allowed.
- Python maintenance releases are a freshness input, not an optional cleanup task. Treat a new `3.14.x` patch exactly like other authoritative upstream updates.
- Python baseline cutovers must update the registry baseline, stack pins, every `actions/setup-python` pin enforced by the guard, and the published constraints/site surfaces that embed the baseline version.
- Because Python baseline changes alter published artifact paths, validation evidence must include SHA-matched `ci.yml` and `daily.yml` runs on the PR head commit and again on the merge commit.

### Ecosystem packages (metadata only)
- Stacks may also list `ecosystem_packages` (for example, `npm` packages or `go` modules) for discovery.
- These entries are surfaced in `index.json` under each stack and are **not** part of the Python constraints resolution pipeline.

### Repo overlays
- Repos can add dependencies or adjust tier placement with an overlay file (see `examples/ai-stack.yaml`).
- `policy/repos.yaml` lists repos and their raw overlay URLs.
- `daily.yml` exports `GITHUB_OVERLAY_TOKEN` from a repo-scoped, contents-read GitHub App token when `OVERLAY_APP_ID` and `OVERLAY_APP_PRIVATE_KEY` are configured.
- The current `policy/repos.yaml` entry points at a public raw GitHub overlay in this repository, so this repo's GitHub Actions evidence does not yet prove private external overlay retrieval end to end.

### Model registry
- `scripts/fetch_models.py` pulls provider model metadata for every policy in `policy/models/*.yaml` (uses provider-specific API keys).
- `scripts/snapshot_docs.py` captures docs from policy URLs, recording both raw snapshot hashes and normalized content hashes for auditability.
- `scripts/build_models.py` merges API data + policy-backed control metadata into `spec.json` and `recommended_defaults.json`, emitting `/models/<provider>/<model_id>/...`.
- Public model paths use canonical provider names (for example, `google` policies publish under `/models/google/`); the `provider` field in payloads remains canonical.

### LLM-first outputs
- Root `llms.txt` and per-snapshot `llms.txt` list stable artifact URLs.
- `index.json` enumerates stacks, repos, models, and latest snapshot metadata.
- Model lookup endpoints include explicit `lookup` URL maps and URI templates; agents should dereference emitted URLs instead of inferring sibling paths.
- Every published HTML page emits `index.md`, `index.txt`, and `index.json` variants (discoverable via `rel="alternate"`, `llms.txt`, and `sitemap.xml`).
- Model lookup indexes (`/models/index.json`, `/models/providers/index.json`, `/models/<provider>/index.json`) also emit `index.md` and `index.txt` for LLM crawlers.
- `/latest/` is an alias to the most recent snapshot for stable links.

## Local debugging (non-authoritative)
Official validation/publishing happens via GitHub Actions only; local runs are debug-only and not accepted for PR validation. See `docs/OPERATIONS.md`.
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

AS_OF=$(date -u +"%Y-%m-%dT%H:%M:%S+00:00")

# Fetch authoritative sources
python scripts/fetch_pypi.py --all-stacks
python scripts/fetch_models.py --allow-missing-key
python scripts/snapshot_docs.py --as-of "$AS_OF"
python scripts/diff_docs.py

# Build and publish a snapshot
python scripts/publish.py --as-of "$AS_OF"
```

## Verifying provenance
- Checksums: `public/provenance/checksums.json`
- Cosign signatures (if configured): `public/provenance/signatures/`
- Cosign bundle (keyless): `public/provenance/signatures/checksums.json.bundle`
- Bundle verification (keyless CI): `cosign verify-blob --bundle public/provenance/signatures/checksums.json.bundle --certificate-identity https://github.com/hey-jj/ai-stack-registry/.github/workflows/daily.yml@refs/heads/main --certificate-oidc-issuer https://token.actions.githubusercontent.com public/provenance/checksums.json`
- Published doc provenance manifests and public-safe copies: `public/provenance/docs/`

## What you can verify
- Dated snapshots with checksums and embedded source citations.
- Compatibility constraints with clear `blocked_by` evidence.
- Model defaults/specs derived from provider APIs and docs (no inference from training data).

## License
Apache-2.0. See `LICENSE`.