Storage System
Overview
The storage system provides file-based persistence for GitHub issues and implementation plans using a hierarchical directory structure. It manages issue metadata, plan versions, and timestamps for incremental synchronization, all organized by repository and issue number.
Architecture
Directory Structure
{issues_path}/
{owner}/
{repo_name}/
.updated-at # Repository last sync timestamp
{issue_number}/
description.md # Issue content in markdown
.updated-at # Issue last update timestamp
plan-20240115-143022.md # Implementation plan (versioned)
plan-20240115-143022.yaml # Plan metadata
plan-20240116-091544.md # Newer plan version
plan-20240116-091544.yaml # Newer plan metadata
Store Components
IssueStore
Manages issue storage and retrieval, located in src/gh_worker/storage/issue_store.py.
Methods:
save_issue()- Persist issue to filesystemload_issue()- Retrieve issue from filesystem (placeholder)get_updated_at()- Get issue last update timestampset_repo_updated_at()- Set repository last sync timestampget_repo_updated_at()- Get repository last sync timestamplist_issues()- List all issue numbers for repositorylist_repositories()- List all repositories in storeget_issue_dir()- Resolve issue directory pathget_repo_dir()- Resolve repository directory path
PlanStore
Manages plan storage and versioning, located in src/gh_worker/storage/plan_store.py.
Methods:
create_plan()- Create new plan with metadatastart_plan_generation()- Create metadata at start of generation (no .md yet)complete_plan()- Write plan content and update metadata after generationget_latest_plan()- Retrieve most recent plan for issue (returns metadata even if .md doesn't exist yet)list_plans()- List all plans for issue (sorted newest first)update_metadata()- Update plan metadatahas_plan()- Check if issue has plan matching its .updated-at timestampget_issue_dir()- Resolve issue directory path
Data Storage
Issue Storage
Files:
description.md- Issue content formatted as markdown.updated-at- ISO 8601 timestamp of last issue update
Format:
Issues saved using Issue.to_markdown() method, preserving:
- Title
- Number
- State
- Labels
- Author
- Assignees
- Milestone
- Creation and update timestamps
- Full body content
Timestamps:
- Stored in ISO 8601 format (e.g., "2024-01-15T14:30:22.123456+00:00")
- Parsed using
datetime.fromisoformat() - Used for incremental sync filtering
Plan Storage
Files:
plan-YYYYMMDD-HHMMSS.md- Plan content with timestamp in filenameplan-YYYYMMDD-HHMMSS.yaml- Associated metadata
Versioning:
- Multiple plans per issue supported
- Filename includes creation timestamp
- Sorted reverse chronologically (newest first)
- Latest plan retrieved by filename sort
Metadata: Stored as YAML with fields:
issue_number- Associated issue numberrepository- Full repository name (owner/name)created_at- Plan creation timestampstatus- Plan status (PENDING, APPROVED, IN_PROGRESS, COMPLETED, FAILED)session_id- Agent session ID (optional)branch_name- Implementation branch (optional)pr_url- Pull request URL (optional)error_message- Error message if failed (optional)completed_at- Implementation completion timestamp (optional)merged_at- PR merge timestamp (optional)agent- Agent name used (optional)model- Model used by agent (optional)commit_hash- Repository commit when plan was generated (optional)
Repository Metadata
Files:
.updated-at- Last sync timestamp for repository
Purpose:
- Enables incremental sync (only fetch issues updated since last sync)
- Updated after successful sync operation
- Stored at repository level, not per-issue
Path Resolution
Issue Directory Path
Example: /var/gh-worker/issues/octocat/hello-world/42/
Repository Directory Path
Example: /var/gh-worker/issues/octocat/hello-world/
Plan File Path
Example: /var/gh-worker/issues/octocat/hello-world/42/plan-20240115-143022.md
Discovery Operations
List Issues
Scans repository directory for subdirectories with numeric names:
- Check if repository directory exists
- Iterate over directory contents
- Filter for directories with
isdigit()names - Convert to integers and sort
- Return sorted list
List Repositories
Scans issues path for owner/repo structure:
- Iterate over top-level directories (owners)
- Skip hidden directories (starting with ".")
- For each owner, iterate over subdirectories (repos)
- Skip hidden directories
- Create Repository objects with owner/name
- Return list of repositories
List Plans
Finds all plan files for an issue:
- Glob for
plan-*.mdfiles in issue directory - Sort reverse chronologically
- Load metadata from corresponding
.yamlfiles - Create default metadata if
.yamlmissing (using file mtime) - Return list of (plan_file, metadata) tuples
Metadata Management
Plan Metadata Creation
- Generate timestamp-based filename
- Write plan content to markdown file
- Create PlanMetadata object with default status (PENDING)
- Save metadata to YAML file
- Return metadata object
Plan Metadata Updates
- Validate metadata has plan_file set
- Resolve metadata file path (same name, .yaml extension)
- Save metadata to YAML using
PlanMetadata.save()
Fallback Metadata
If metadata file missing:
- Use file modification time as created_at
- Set status to PENDING
- Other fields remain None
Requirements
Directory Structure
MUST:
- Create directories with
parents=True, exist_ok=True - Use owner/repo/issue hierarchy
- Store timestamps in hidden
.updated-atfiles - Use numeric directory names for issues
- Support multiple plans per issue with unique filenames
SHOULD:
- Skip hidden directories (starting with ".") in discovery
- Use consistent path separators (Path objects)
- Create parent directories automatically
- Handle missing directories gracefully (return empty lists)
MAY:
- Implement directory locking for concurrent access
- Compress old plans to save space
- Provide directory cleanup utilities
- Support symbolic links or alternative layouts
Issue Storage
MUST:
- Save issues as markdown using
Issue.to_markdown() - Store update timestamps in ISO 8601 format
- Update repository timestamp after sync
- Create issue directories automatically
- Support timestamp retrieval for individual issues
SHOULD:
- Preserve all issue fields in markdown
- Use UTF-8 encoding for all files
- Handle filesystem errors gracefully
- Log storage operations
MAY:
- Implement issue loading from markdown (currently placeholder)
- Cache issue data in memory
- Support issue deletion
- Provide issue update detection
Plan Storage
MUST:
- Version plans with timestamp in filename (YYYYMMDD-HHMMSS)
- Store plan content and metadata separately
- Use markdown for content, YAML for metadata
- Return latest plan by filename sort
- Create default metadata if missing
- Support metadata updates
SHOULD:
- Sort plans reverse chronologically
- Validate metadata on save
- Log plan creation and updates
- Handle missing metadata gracefully
MAY:
- Implement plan archiving or deletion
- Support plan comparison or diffing
- Provide plan rollback functionality
- Track plan revisions or lineage
Timestamp Management
MUST:
- Use ISO 8601 format for all timestamps
- Store in UTC or with timezone information
- Parse using
datetime.fromisoformat() - Handle "Z" suffix for UTC timestamps
SHOULD:
- Use timezone-aware datetime objects
- Validate timestamp format on read
- Preserve microsecond precision
MAY:
- Support alternative timestamp formats
- Provide timestamp conversion utilities
- Track multiple timestamp types (created, updated, accessed)
Discovery Operations
MUST:
- Return empty lists for missing directories (not errors)
- Filter for directories only (not files)
- Skip hidden directories (starting with ".")
- Validate numeric directory names for issues
- Sort results consistently
SHOULD:
- Handle permission errors gracefully
- Log discovery operations for debugging
- Return stable, deterministic results
MAY:
- Cache discovery results with TTL
- Support filtering or search criteria
- Provide pagination for large result sets
- Implement watch/notification for changes
Metadata Management
MUST:
- Save metadata as YAML
- Load metadata using
PlanMetadata.load() - Create default metadata if file missing
- Validate metadata has plan_file before update
- Store plan_file path in metadata
SHOULD:
- Use human-readable YAML formatting
- Preserve metadata field order
- Handle malformed metadata files gracefully
- Log metadata operations
MAY:
- Validate metadata schema on load
- Support metadata migration between versions
- Provide metadata search or query
- Track metadata change history
Usage Examples
Save Issue
from gh_worker.storage.issue_store import IssueStore
from gh_worker.models.issue import Issue
from pathlib import Path
store = IssueStore(Path("/var/gh-worker/issues"))
issue = Issue(...) # Issue object from GitHub API
store.save_issue(issue)
Get Repository Last Sync
from gh_worker.models.repository import Repository
repo = Repository(owner="octocat", name="hello-world")
last_sync = store.get_repo_updated_at(repo)
if last_sync:
print(f"Last synced: {last_sync.isoformat()}")
Create Plan
from gh_worker.storage.plan_store import PlanStore
plan_store = PlanStore(Path("/var/gh-worker/issues"))
plan_content = "## Implementation Plan\n\n1. Step one\n2. Step two"
metadata = plan_store.create_plan(repo, issue_number=42, content=plan_content)
print(f"Plan created: {metadata.plan_file}")
Get Latest Plan
result = plan_store.get_latest_plan(repo, issue_number=42)
if result:
plan_file, metadata = result
plan_content = plan_file.read_text()
print(f"Status: {metadata.status}")
print(f"Content:\n{plan_content}")
List All Plans
plans = plan_store.list_plans(repo, issue_number=42)
for plan_file, metadata in plans:
print(f"{plan_file.name}: {metadata.status} ({metadata.created_at})")
Update Plan Metadata
from gh_worker.models.plan import PlanStatus
# Get latest plan
_, metadata = plan_store.get_latest_plan(repo, issue_number=42)
# Update status
metadata.status = PlanStatus.IN_PROGRESS
metadata.session_id = "abc123"
# Save changes
plan_store.update_metadata(metadata)
List Repositories
repositories = store.list_repositories()
for repo in repositories:
issues = store.list_issues(repo)
print(f"{repo.full_name}: {len(issues)} issues")
Extension Points
The storage system can be extended to support:
- Alternative backends (database, cloud storage)
- Compression for old issues/plans
- Full-text search across issues and plans
- Atomic operations with transactions
- Concurrent access with locking
- Issue and plan archiving/deletion
- Storage quotas and cleanup policies
- Backup and restore utilities