Storage Backends
SpyWeb is designed to be self-contained and portable, requiring no external database server like PostgreSQL or MySQL. It uses an embedded database stored in a single data file. Choose between two backends at download time.
At a Glance
Section titled “At a Glance”| Feature | KV (redb, default) | SQL (SQLite) |
|---|---|---|
| Binary Size | ~7MB | ~8MB |
| Philosophy | Minimalist & Zero-Configuration | Transparent & Queryable |
| Performance | High (Low overhead) | Moderate (WAL optimized) |
| External Access | None (Internal format) | Any SQLite tool |
| Lua SQL Support | No | Yes (db_query, db_exec) |
| Concurrency | Multi-reader, Single-writer | Serialized (Mutex) |
| Data Format | Key-Value (B-Tree) | Relational (SQL) |
redb (Default)
Section titled “redb (Default)”High-performance, type-safe embedded key-value store.
When to use it:
- Zero-maintenance, hands-off storage
- Only need data in the Web Dashboard or Webhooks
- Don’t need custom SQL queries
- Prefer an internal-only data format
File: data (single binary file in working directory)
Format: JSON strings keyed by (job_id, reversed_timestamp)
SQLite (Full Database Control)
Section titled “SQLite (Full Database Control)”Available in the -sql build variants.
When to use it:
- Total control over data structure with custom tables
- Use external tools (DBeaver, TablePlus, sqlite3)
- Complex Lua logic with raw SQL (JOINs, aggregations)
- Store and relate scraped data with custom state
Files: data, data-wal (WAL), data-shm (Shared Memory)
Performance: Configured with PRAGMA journal_mode=WAL and PRAGMA synchronous=NORMAL
Database Schema
Section titled “Database Schema”Three primary tables:
records-job_id(string),rev_ts(nanosecond timestamp, records ordered newest-first),json(full record as JSON)seen- Deduplication hashes for avoiding duplicate recordslua_user- Persistent key-value state backingstore_*/global_store_*
Lua SQL Bindings
Section titled “Lua SQL Bindings”db_query(sql, [params])- Executes a SELECT statement and returns an array of tablesdb_exec(sql, [params])- Executes any SQL statement (INSERT, UPDATE, DELETE, CREATE) and returns the number of rows affected
-- db_query: SELECT, returns array of tableslocal rows = db_query("SELECT json FROM records WHERE job_id = ? LIMIT 5", { "my-job" })for _, row in ipairs(rows) do local data = json_decode(row.json) print(data.title)end
-- db_exec: any SQL, returns rows affecteddb_exec([[ CREATE TABLE IF NOT EXISTS markers ( id TEXT PRIMARY KEY, val INTEGER )]])
function on_finished() db_exec("INSERT OR REPLACE INTO markers (id, val) VALUES (?, ?)", { "last_page", 10 })endShared Task Queue Example
Section titled “Shared Task Queue Example”Use a database table as a shared work queue across multiple workers. A dedicated “Discovery” job can populate the queue for other jobs to process.
1. Initialize the Schema - CREATE TABLE IF NOT EXISTS queue ensures the table exists on every load:
-- Top-level: runs once at load time (startup/reload)db_exec([[ CREATE TABLE IF NOT EXISTS queue ( id TEXT PRIMARY KEY, url TEXT NOT NULL, status TEXT DEFAULT 'pending' )]])
-- For fresh databases, seed with initial URLsdb_exec([[ INSERT OR IGNORE INTO queue (id, url) VALUES ('task-1', 'https://example.com/page/1'), ('task-2', 'https://example.com/page/2')]])2. Worker Pick & Lock - Each worker atomically claims the next pending task:
function before_fetch(request, ctx) local rows = db_query([[ UPDATE queue SET status = 'processing' WHERE id = (SELECT id FROM queue WHERE status = 'pending' LIMIT 1) RETURNING id, url ]]) if #rows == 0 then return nil end request.url = rows[1].url ctx.shared.current_task_id = rows[1].id return requestend3. Mark Completion - Track success or failure per task:
function on_finally(ctx) local task_id = ctx.shared.current_task_id if not task_id then return end local final_status = ctx.telemetry.map.fetch.status == "success" and "completed" or "failed" db_exec("UPDATE queue SET status = ? WHERE id = ?", { final_status, task_id })endLimitations
Section titled “Limitations”The two backends are not binary compatible. You cannot point the redb version at an SQLite data file or vice versa. Export/import manually if switching.