Syncer
Overview
The syncer is the core engine of Notes Vault. It discovers files, applies consumer query matching, and exports matching files to consumer target directories. There is no persistent index or database - each sync run operates directly on the filesystem.
Requirements
File Discovery
- The system MUST discover files by expanding each file group's glob pattern.
- The system MUST support recursive glob patterns using
**. - The system MUST resolve
~in path patterns. - The system MUST silently skip glob patterns that match no files.
- The system MUST collect files from all configured file groups, scanning multiple groups in parallel.
Query Matching
- For each file, the system MUST check the consumer's
exclude_queriesfirst. - If any
exclude_queriespattern matches the file content, the system MUST skip the file. - If at least one
include_queriespattern matches the file content, the system MUST export the file. - If no
include_queriespattern matches, the system MUST skip the file. - A consumer with an empty
include_querieslist exports nothing. - The system MUST cache compiled regex patterns to avoid recompilation across files.
- Pattern matching MUST be case-insensitive.
Export
- The system MUST delete the consumer's target directory before exporting, then recreate it empty.
- The system MUST copy matching files into the target directory.
- When
rename: false, the system MUST preserve the original filename. - When
rename: falseand filenames collide, the system MUST append a numeric suffix (e.g.,note_1.md). - When
rename: true, the system MUST rename files to<uuid5><extension>where the UUID is derived from the absolute source file path usinguuid.uuid5(uuid.NAMESPACE_URL, str(file_path.resolve())). - The system MUST use
shutil.copy2to preserve file metadata. - The system MUST process files in parallel using a thread pool.
- A lock MUST guard destination path computation and copy for
rename: falseto prevent TOCTOU races.
Error Handling
- The system MUST catch read errors on a per-file basis and continue processing remaining files.
- The system MUST count errored files separately from skipped and exported files.
- The system SHOULD log warnings for files that fail to read.
- The system MUST NOT abort a sync run due to a single file error.
Progress Reporting
- The system MUST accept optional callbacks for file-found events and export progress.
- The CLI MUST display a progress bar during sync using Rich.
Statistics
sync_consumerMUST return a dict with keysexported,skipped, anderrors.- The CLI MUST print these statistics after each consumer sync.
Data Model
sync_consumer(consumer_name, consumer, config, on_file_found?, progress_callback?, workers?) -> dict[str, int]
sync_all(consumer_name?) -> dict[str, dict[str, int]]
Behavior
sync_consumer
- Resolve the target directory path (expand
~). - Delete the target directory if it exists; create it fresh.
- Collect all files from all file groups in parallel.
- For each file (in parallel):
a. Read file content (UTF-8). On error, increment
errorsand skip. b. If anyexclude_queriespattern matches, incrementskippedand skip. c. If noinclude_queriespattern matches, incrementskippedand skip. d. Compute destination path (UUID-renamed or original filename, with collision avoidance). e. Copy file to destination. Incrementexported. - Return
{exported, skipped, errors}.
sync_all
- Load config.
- If
consumer_nameis given, validate it exists and restrict to that consumer only. - For each consumer, call
sync_consumerand collect results. - Return a dict keyed by consumer name.
Statistics
| Field | Description |
|---|---|
exported |
Files copied to the target directory |
skipped |
Files excluded by query rules |
errors |
Files that could not be read |