SSTable¶
Sorted String Tables — on-disk persistent storage.
SSTableWriter¶
SSTableWriter(directory, file_id, snapshot_id, level, block_size=BLOCK_SIZE_DEFAULT, block_entries=0, bloom_n=1000000, bloom_fpr=0.01)
¶
Write-once SSTable builder.
Records must be added in ascending key order via :meth:put.
Call :meth:finish (async) or :meth:finish_sync when done.
Initialize an SSTable writer and open the data file for writing.
The output directory is created if it does not exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
directory
|
Path
|
Target directory for the SSTable files. |
required |
file_id
|
FileID
|
Unique identifier for this SSTable. |
required |
snapshot_id
|
SnapshotID
|
ID of the memtable snapshot being flushed. |
required |
level
|
Level
|
Compaction level (0 for flush, 1+ for compaction output). |
required |
block_size
|
int
|
Target data block size in bytes. Used when
|
BLOCK_SIZE_DEFAULT
|
block_entries
|
int
|
If non-zero, flush a block after this many records instead of using byte-size threshold. |
0
|
bloom_n
|
int
|
Expected element count for the bloom filter. The
flush path passes |
1000000
|
bloom_fpr
|
float
|
Target false positive rate for the bloom filter.
Read from |
0.01
|
Source code in app/sstable/writer.py
put(key, seq, timestamp_ms, value)
¶
Add a record. Keys must be in ascending order.
Source code in app/sstable/writer.py
finish()
async
¶
Finalize the SSTable (async). Bloom + index written concurrently.
Source code in app/sstable/writer.py
finish_sync()
¶
Finalize the SSTable (sync). For L1+ subprocess use.
Source code in app/sstable/writer.py
SSTableReader¶
SSTableReader(directory, file_id, meta, index, bloom, cache, mm, fd)
¶
Read-only access to one SSTable.
Bloom filter and sparse index are loaded lazily on the first
call to :meth:get. Once loaded, they are cached in the shared
:class:BlockCache so a future reader for the same file (e.g.
after engine restart) can skip the disk read.
Construct an SSTableReader (prefer the :meth:open factory).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
directory
|
Path
|
Path to the SSTable directory on disk. |
required |
file_id
|
FileID
|
Unique identifier for this SSTable. |
required |
meta
|
SSTableMeta
|
Parsed metadata from |
required |
index
|
SparseIndex | None
|
Pre-loaded sparse index, or |
required |
bloom
|
BloomFilter | None
|
Pre-loaded bloom filter, or |
required |
cache
|
BlockCache | None
|
Shared block cache for cross-reader reuse, or |
required |
mm
|
mmap | None
|
Memory-mapped |
required |
fd
|
int
|
Raw file descriptor for the mmap (kept open until close). |
required |
Source code in app/sstable/reader.py
meta
property
¶
Return the SSTable metadata.
file_id
property
¶
Return the file ID.
open(directory, file_id, cache=None, level=0)
async
classmethod
¶
Open an SSTable for reading.
Bloom and index are NOT loaded here — they are deferred to the
first get() call (lazy loading). Only meta.json is
parsed and data.bin is memory-mapped.
Source code in app/sstable/reader.py
get(key)
¶
Look up key. Returns (seq, timestamp_ms, value) or None.
Flow: bloom check → sparse index bisect → block scan. Bloom + index loaded lazily on first call.
Source code in app/sstable/reader.py
scan_all()
¶
Return all records in this SSTable as a sorted list.
Used by the disk command to display SSTable contents.
Source code in app/sstable/reader.py
iter_sorted()
¶
Yield all records in ascending key order without materialising.
Used as input to KWayMergeIterator during compaction. More memory-efficient than scan_all() for large SSTables.
Source code in app/sstable/reader.py
close()
¶
Release mmap and file descriptor. Never raises.
Source code in app/sstable/reader.py
SSTableMeta¶
SSTableMeta(file_id, snapshot_id, level, size_bytes, record_count, block_count, min_key, max_key, seq_min, seq_max, bloom_fpr, created_at, data_file, index_file, filter_file)
dataclass
¶
Immutable metadata for one SSTable.
Serialized to meta.json inside the SSTable directory. Its
presence on disk is the completeness signal — if missing, the
SSTable is considered incomplete and ignored on recovery.
Attributes:
| Name | Type | Description |
|---|---|---|
file_id |
FileID
|
Unique identifier (UUIDv7 hex) for this SSTable. |
snapshot_id |
SnapshotID
|
ID of the memtable snapshot that produced this table. |
level |
Level
|
Compaction level (0 for flush output, 1+ for compaction output). |
size_bytes |
int
|
Total size of |
record_count |
int
|
Number of key-value records stored. |
block_count |
int
|
Number of data blocks in |
min_key |
Key
|
Lexicographically smallest key in this table. |
max_key |
Key
|
Lexicographically largest key in this table. |
seq_min |
SeqNum
|
Smallest sequence number across all records. |
seq_max |
SeqNum
|
Largest sequence number across all records. |
bloom_fpr |
float
|
Configured false positive rate of the bloom filter. |
created_at |
str
|
ISO-8601 timestamp of when this table was written. |
data_file |
str
|
Filename of the data file (always |
index_file |
str
|
Filename of the sparse index (always |
filter_file |
str
|
Filename of the bloom filter (always |
to_json()
¶
Serialize to JSON with base64-encoded keys.
Source code in app/sstable/meta.py
from_json(data)
classmethod
¶
Deserialize from JSON.
Source code in app/sstable/meta.py
SSTableRegistry¶
SSTableRegistry()
¶
Thread-safe registry of open SSTable readers with ref counting.
Initialize an empty reader registry with no registered readers.
Source code in app/sstable/registry.py
register(file_id, reader)
¶
Register an open reader.
open_reader(file_id)
¶
Acquire a ref-counted handle to a reader.
Source code in app/sstable/registry.py
mark_for_deletion(file_id)
¶
Mark a reader for deletion. Cleaned up when refcount hits 0.
close_all()
¶
Close idle readers and mark in-use readers for deferred cleanup.
Source code in app/sstable/registry.py
SSTableManager¶
SSTableManager(data_root, cache, registry, l0_order, l0_dirs, manifest, config=None)
¶
Manages all on-disk SSTable state (L0 + L1 + L2 + L3).
Construct the SSTable manager (prefer the :meth:load factory).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_root
|
Path
|
Root data directory for the engine. |
required |
cache
|
BlockCache
|
Shared block cache for data blocks, indexes, and blooms. |
required |
registry
|
SSTableRegistry
|
Ref-counted registry of open SSTable readers. |
required |
l0_order
|
list[FileID]
|
L0 file IDs in newest-first order. |
required |
l0_dirs
|
dict[FileID, Path]
|
Mapping from L0 file ID to its on-disk directory. |
required |
manifest
|
Manifest
|
Persistent manifest for SSTable ordering. |
required |
config
|
LSMConfig | None
|
Live engine configuration, or |
None
|
Source code in app/engine/sstable_manager.py
max_level
property
¶
Maximum level depth (from config or default).
cache
property
¶
Return the block cache instance.
l0_count
property
¶
Number of L0 SSTables.
load(data_root, cache=None, config=None)
async
classmethod
¶
Load SSTables from disk using the manifest for ordering.
Source code in app/engine/sstable_manager.py
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 | |
flush(snapshot, file_id)
async
¶
Write snapshot to a new L0 SSTable, return (meta, reader).
The bloom filter is sized to len(snapshot) (exact entry count)
with a false positive rate from config.bloom_fpr.
Source code in app/engine/sstable_manager.py
commit(file_id, reader, sst_dir)
¶
Register a flushed L0 SSTable and persist manifest.
Source code in app/engine/sstable_manager.py
commit_compaction_async(task, new_meta, new_reader)
async
¶
Atomically commit a compaction result.
Acquires write locks on src and dst levels in ascending order.
Commit ordering (non-negotiable): 1. Register new reader (dst level becomes readable) 2. Write manifest (durable) 3. Mark old files for deletion (deferred by ref-count) 4. Update in-memory state 5. Evict stale cache blocks
Source code in app/engine/sstable_manager.py
433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 | |
get(key)
async
¶
Look up key across L0 then L1, L2, L3.
Read locks prevent compaction commits from swapping level contents mid-scan.
Source code in app/engine/sstable_manager.py
compaction_snapshot()
¶
Return a snapshot of state needed by CompactionManager.
Source code in app/engine/sstable_manager.py
level_seq_min(level)
¶
Return seq_min of the SSTable at level, or 0 if none.
Source code in app/engine/sstable_manager.py
level_size_bytes(level)
¶
Return the size in bytes of the SSTable at level, or 0.
Source code in app/engine/sstable_manager.py
level_record_count(level)
¶
Return the record count of the SSTable at level, or 0.
Source code in app/engine/sstable_manager.py
max_seq_seen()
¶
sst_dir_for(file_id)
¶
new_file_id()
¶
show_disk(file_id=None)
¶
Inspect SSTable contents.