- Peter Steinberger's Go CLI just split Discord archiving into two roles: publishers crawl with a bot and push compressed NDJSON snapshots to a private Git repo; subscribers clone and query locally with FTS5 — zero Discord credentials.
TL;DR
discrawl 0.3.0 (shipped 2026-04-21) turns a local Discord archive into a portable, Git-synced database. Teams with bot credentials publish compressed NDJSON snapshots to a private repo; everyone else runs discrawl subscribe <git-url> and queries FTS5-indexed SQLite locally — no Discord API access needed. Auto-refresh, activity reports, and OpenClaw field notes round out the release.
What's new
- Git-backed archive sync. Export/import compressed JSONL snapshots with manifests, subscribe to a Git repo as the data source, and run in git-only mode without Discord credentials.
- Auto-refresh.
messages,search, and report commands pull a fresh snapshot when local data crosses a freshness threshold (default 15 minutes), with graceful fallback to live Discord when a bot is configured. - Activity reports. Inline stats, latest sync timestamps, AI field notes sourced from OpenClaw, constrained queries, and operation timeouts.
- Field notes. Lightweight AI summaries pinned to archive state — useful for retrospectives and async standups.
- Faster imports. Repo imports skip expensive rebuilds when the snapshot manifest is unchanged; GitHub Actions caches warmed SQLite across runs.
Why it matters
Until now, anyone who wanted to search a Discord server's history through discrawl needed a bot token with Server Members Intent and Message Content Intent — both privileged scopes Discord requires verification for once a bot crosses 100 servers. That's fine for the admin, unreasonable for every contributor, researcher, or AI agent who just wants read access.
The 0.3.0 model is simple: only one person (or one CI job) holds the bot credentials and pushes snapshots upstream. Everyone else gets read access through Git. Permissions collapse to who can clone the repo. That's a credential-scope reduction most teams have been improvising around with ad-hoc exports, shared secrets, or manual copy-paste.
The design also maps cleanly onto how developer teams already work. Private repos, branch protection, signed commits, audit logs — all of it applies to the archive without new infrastructure. A Git SHA becomes a verifiable pointer to a point-in-time view of the server, which matters for anyone running compliance, retros, or research against a moving target.
Worth noting what didn't change: the publisher still needs the privileged bot. The trust surface moved, it didn't disappear. But trust surfaces are easier to manage when there's exactly one of them.
Technical facts
| Property | Detail |
|---|---|
| Language | Go 1.26.2+ (95.5% of repo) |
| Storage | Local SQLite with FTS5 full-text search |
| Transport | Compressed NDJSON snapshots + manifest, pushed to Git |
| Incremental | sync --latest-only skips full-history crawl |
| Multi-guild | sync --all ignores default_guild_id |
| Error tolerance | sync --full continues past Discord 403 on forum threads |
| Schema safety | PRAGMA user_version gate blocks incompatible DBs |
| Reports | Node 24 runtime, refreshed SQLite libs |
Comparison
Versus 0.2.x, the split is between who crawls and who reads. Publishers still need the full Discord bot scope; subscribers drop those requirements entirely. Versus other Discord archivers like Sanqui/discard and discord-discard, discrawl is the only tool combining Git-as-transport, FTS5-indexed SQLite, live Gateway tailing with periodic repair syncs, and AI field notes in a single CLI.
Use cases
- Open-source communities publishing searchable history to contributors without handing out bot tokens. One maintainer pushes; the whole contributor base reads.
- Support teams syncing Discord conversations into a private repo for analysts and internal AI agents that retrieve context without holding live Discord credentials.
- Researchers pinning a point-in-time snapshot — the Git SHA becomes a reproducible archive state for citations or longitudinal studies.
- CI pipelines running scheduled reports on a cached SQLite DB — the GitHub Actions cache of the warmed DB cuts re-crawl cost to near zero on subsequent runs.
- Retros & field notes generated from a frozen snapshot so the summary matches the exact data it references — the note and the data travel together in the same commit.
- Multi-guild publishers running one pipeline across several servers via
sync --all, without juggling per-guild defaults.
Limitations & pricing
Free and open source. The publisher still needs a bot with view-channel, read-message-history, and both privileged intents — the credential ask didn't disappear, it moved. Forum threads can return Discord 403; the new sync --full continue-on-error flag mitigates but doesn't eliminate gaps. Scope remains guild-level, not DMs. Snapshot history grows with the Git repo, so publishers own retention policy. Toolchain: Go 1.26+.
What's next
Release notes hint at tighter OpenClaw integration for richer field notes, more aggressive incremental-import performance, and polish on multi-guild publishing. No public dates are committed yet, but the cadence from 0.2.x to 0.3.0 suggests another meaningful drop before summer. Install via brew install steipete/tap/discrawl from the tap, or go install from the repo. If you already run 0.2.x, the upgrade is a straight re-install plus a one-time discrawl publish to bootstrap the remote.
Nguồn: steipete/discrawl, Release notes, announcement tweet.

