Development Documentation
Overview
This directory contains comprehensive technical documentation for Capcat developers, including architecture details, design patterns, implementation logic, and team onboarding materials.
Document Index
1. Architecture & Logic
File
01-architecture-logic.md
Purpose
Complete technical architecture explanation for junior developers
Contents
- System architecture layers (UI, orchestration, source, processing, output)
- Hybrid source system (config-driven vs custom)
- Core design patterns (Factory, Registry, Strategy, Session Pooling, Observer)
- Complete data flow diagrams
- Error handling strategy
- Performance optimizations
- Security considerations
- Configuration management
Target Audience
Junior developers, new team members, code reviewers
Key Concepts
- Hybrid architecture supporting 2 source types
- Registry pattern for source discovery
- Factory pattern for source instantiation
- Graceful degradation on errors
- Connection pooling for performance
Code Examples
30+ real code snippets with explanations
2. Team Onboarding
File
02-team-onboarding.md
Purpose
Complete onboarding guide for new developers and designers
Contents
- Day 1: Environment setup and verification
- Day 1-2: Codebase exploration exercises
- Day 3-4: First contribution tasks
- Week 1: Deep dive sessions (architecture, user interface, testing)
- Week 2: Collaboration practices (code review, git workflow)
- Development workflows (adding sources, fixing bugs)
- Common tasks and debugging tips
- Success checklist and next steps
Target Audience
New team members (developers)
Time Estimate
- Setup: 30 minutes
- Codebase exploration: 2 days
- First contribution: 3-4 days
- Full onboarding: 2 weeks
Learning Path
- Environment setup → Working Capcat installation
- Code exploration → Understanding architecture
- Small contributions → Confidence building
- Independent work → Full productivity
How to Use This Documentation
For New Developers
Week 1 Roadmap
Day 1: Setup
(2-4 hours)
# Follow setup in 02-team-onboarding.md
git clone <repo>
./scripts/fix_dependencies.sh
./capcat list sources # Verify installation
Day 2: Architecture
(4-6 hours)
- Read
01-architecture-logic.md sections 1-3
- Draw architecture diagram from memory
- Complete Exercise 2: Trace code flow
- Review design patterns section
Day 3: First Contribution
(4-6 hours)
- Pick Task 1 or 2 from onboarding guide
- Add config-driven source or fix documentation
- Create pull request
- Iterate based on feedback
Day 4-5: Deep Understanding
(8-10 hours)
- Complete all hands-on exercises
- Add custom source (if comfortable with Python)
- Write tests for your contribution
- Attend deep dive sessions
For Experienced Developers
Quick Reference
Adding New Source
- Config-driven: See "Adding a New Config-Driven Source" in onboarding doc
- Custom: See "Adding a New Custom Source" + Architecture patterns in logic doc
Understanding Data Flow
- Section "Complete Article Fetching Flow" in
01-architecture-logic.md
- Follow execution: CLI → Registry → Factory → Source → Processing → Output
Debugging Issues
- "Debugging Tips" section in onboarding doc
- "Error Handling Strategy" in architecture doc
- Use
--debug flag for verbose logging
Code Review Guidelines
- "Code Review Guidelines" in onboarding doc
- Check against design patterns in architecture doc
For Product Designers
Understanding Technical Constraints
Why Hybrid Architecture?
- Read:
01-architecture-logic.md "Hybrid Source Layer" section
- Understand: Config-driven (simple, 30min) vs Custom (complex, 4hr)
- Implication: Feature complexity affects development time
Why CLI First?
- Read: Architecture decisions
- Understand: Automation, scriptability, power users
- Implication: GUI is secondary, not primary interface
Performance Constraints
- Read: "Performance Optimizations" in architecture doc
- Understand: Network I/O is bottleneck
- Implication: Bulk operations faster than individual
Privacy Architecture
- Read: "Security Considerations" in architecture doc
- Understand: Local-first, no telemetry by design
- Implication: Analytics limited, user research critical
Collaboration Points
- Feature feasibility discussion
- Technical constraints awareness
- Implementation time estimates
- Interface vs technical tradeoffs
For Code Reviewers
Review Checklist
(from onboarding doc):
Architecture Compliance
- [ ] Follows established patterns (Factory, Registry, Strategy)
- [ ] Fits into layer architecture (doesn't violate boundaries)
- [ ] Reuses existing components (no reinventing wheel)
- [ ] Maintains separation of concerns
Code Quality
- [ ] PEP 8 compliant
- [ ] Docstrings present and clear
- [ ] No code duplication
- [ ] Single responsibility principle
Testing
- [ ] Unit tests for new functions
- [ ] Integration tests for workflows
- [ ] Edge cases covered
- [ ] Tests actually test what they claim
Documentation
- [ ] README updated if user-facing change
- [ ] Architecture doc updated if structural change
- [ ] API reference updated if interface change
- [ ] Comments explain "why" not "what"
Architecture Overview
System Layers
┌─────────────────────────────────────┐
│ USER INTERFACES │ cli.py, interactive.py
├─────────────────────────────────────┤
│ CORE ORCHESTRATION │ capcat.py, config.py
├─────────────────────────────────────┤
│ SOURCE SYSTEM │ Registry, Factory, Sources
├─────────────────────────────────────┤
│ CONTENT PROCESSING │ Fetcher, MediaProcessor
├─────────────────────────────────────┤
│ OUTPUT GENERATION │ FileWriter, HTMLGenerator
└─────────────────────────────────────┘
Key Components
Source System
SourceRegistry: Auto-discovers sources from filesystem
SourceFactory: Creates appropriate source instance
BaseSource: Abstract base for all sources
ConfigDrivenSource: YAML-based simple sources
- Custom sources: Python implementations
Processing Pipeline
- Article discovery (Source.get_articles)
- Content fetching (ArticleFetcher)
- Media processing (UnifiedMediaProcessor)
- Format conversion (HTMLConverter)
- File generation (FileWriter)
Design Patterns
- Factory: Source creation
- Registry: Source discovery
- Strategy: Content extraction
- Observer: Progress tracking
- Singleton: Session pooling
Development Workflows
Quick Reference
Add Config Source
(30 min):
# Create YAML config
cat > sources/active/config_driven/configs/source.yaml <<EOF
display_name: "Source Name"
base_url: "https://example.com/"
category: tech
article_selectors: [".headline a"]
content_selectors: [".article-body"]
EOF
# Test
./capcat fetch source --count 5
Add Custom Source
(4 hrs):
# Create structure
mkdir -p sources/active/custom/source
cd sources/active/custom/source
# Create source.py (implement BaseSource)
# Create config.yaml
# Create __init__.py
# Test and commit
Fix Bug
- Write failing test
- Fix code
- Verify test passes
- Run full test suite
- Commit with "fix:" prefix
Add Feature
- Discuss with team
- Update architecture doc if needed
- Implement with tests
- Update user docs
- Create PR with detailed description
Code Organization
Directory Structure
Application/
├── capcat.py # Main application
├── cli.py # CLI interface
├── core/ # Core modules
│ ├── source_system/ # Source management
│ │ ├── source_registry.py
│ │ ├── source_factory.py
│ │ ├── base_source.py
│ │ └── config_driven_source.py
│ ├── article_fetcher.py
│ ├── unified_media_processor.py
│ └── config.py
├── sources/
│ └── active/
│ ├── config_driven/
│ │ └── configs/*.yaml
│ └── custom/
│ └── */source.py
├── tests/ # Test suite
├── docs/ # Documentation
│ ├── tutorials/ # User and exhaustive tutorials
│ └── development/ # This directory
└── scripts/ # Utility scripts
Import Conventions
# Standard library
import os
import sys
from typing import List, Dict
# Third-party
import requests
import yaml
from bs4 import BeautifulSoup
# Local modules
from core.config import get_config
from core.source_system.base_source import BaseSource
from core.exceptions import SourceError
Naming Conventions
Files
lowercase_with_underscores.py
Classes
PascalCase
Functions
lowercase_with_underscores
Constants
UPPERCASE_WITH_UNDERSCORES
Private
_leading_underscore
Testing Strategy
Test Types
Unit Tests
(Fast, isolated):
# Test individual functions
def test_slugify():
assert slugify("Hello World") == "hello_world"
assert slugify("Test@#$123") == "test_123"
Integration Tests
(Medium speed, component interaction):
# Test component interactions
def test_source_registry_and_factory():
registry = SourceRegistry()
source = registry.get_source('hn')
articles = source.get_articles(count=5)
assert len(articles) == 5
End-to-End Tests
(Slow, full workflow):
# Test complete user workflows
def test_fetch_command_workflow():
result = run_capcat("fetch hn --count 5")
assert result.success
assert output_exists()
Coverage Goals
- Core modules: 90%+
- Sources: 80%+
- CLI: 70%+
- Overall: 85%+
Running Tests
# All tests
pytest
# With coverage
pytest --cov=core --cov=sources
# Specific module
pytest tests/test_source_registry.py
# Watch mode (re-run on change)
ptw
Common Patterns
Error Handling
try:
result = operation()
except SpecificError as e:
logger.error(f"Operation failed: {e}")
return default_value
except Exception as e:
logger.exception("Unexpected error")
raise
Logging
logger = get_logger(__name__)
logger.debug("Detailed debug info")
logger.info("User-relevant information")
logger.warning("Something unexpected")
logger.error("Operation failed")
logger.exception("Error with traceback")
Configuration
# Get configuration
config = get_config()
count = config.get('default_count', 30)
# Respect CLI override
if args.count:
count = args.count
Performance Guidelines
Do's
- Use session pooling for HTTP requests
- Parallelize independent operations
- Cache expensive computations
- Use generators for large datasets
- Lazy load when possible
Don'ts
- Don't create new sessions per request
- Don't process serially when can parallelize
- Don't load entire files into memory
- Don't make unnecessary network calls
- Don't ignore timeouts
Example Optimization
# Bad: Sequential processing
for article in articles:
content = fetch_content(article.url) # Slow
process(content)
# Good: Parallel processing
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=10) as executor:
contents = executor.map(fetch_content, article_urls)
for content in contents:
process(content)
Security Guidelines
Input Validation
def validate_count(count):
if not 1 <= count <= 1000:
raise ValidationError("Count must be 1-1000")
return count
Path Sanitization
def sanitize_path(path):
# Remove path traversal
path = path.replace('..', '')
# Remove absolute paths
path = path.lstrip('/')
return path
No Secrets in Code
# Bad
API_KEY = "secret_key_12345"
# Good
API_KEY = os.getenv('CAPCAT_API_KEY')
if not API_KEY:
raise ConfigError("API_KEY not set")
Related Documentation
User Documentation
Quick Start
../quick-start.md
Getting Started Tutorial
../tutorials/user/01-getting-started.md
Architecture
../architecture.md
Source Development
../source-development.md
Testing Guide
../testing.md
API Documentation
API Reference
../api-reference.md
Module Docs
../api/
Contributing
Before Starting Work
- Check existing issues and PRs
- Discuss significant changes with team
- Read relevant documentation
- Understand user impact
Development Process
- Create feature branch
- Write failing tests
- Implement feature
- Make tests pass
- Update documentation
- Create pull request
- Address review feedback
- Merge when approved
Code Review Process
- Self-review checklist
- Automated checks (CI)
- Peer review (2 approvals)
- Architecture review (for significant changes)
- UX review (for user-facing changes)
Resources
Internal
Codebase
GitHub repository
Documentation
docs/ directory
Examples
examples/ directory
Tests
tests/ directory
External
Python
https://docs.python.org/3/
BeautifulSoup
https://www.crummy.com/software/BeautifulSoup/
Requests
https://requests.readthedocs.io/
Pytest
https://docs.pytest.org/
Contact
Tech Lead
[Name]
Senior Developer
[Name]
DevOps
[Name]
Channels
- Slack: #capcat-dev
- Email: dev-team@example.com
- Office hours: Tue/Thu 2-3pm
Quick Command Reference
# Setup
./scripts/fix_dependencies.sh
# Run Capcat
./capcat fetch hn --count 10
# Tests
pytest
pytest --cov
# Code quality
black .
flake8 core/ sources/
# Documentation
python3 scripts/run_docs.py
# Debug mode
./capcat --debug fetch hn --count 1
Last Updated
2025-01-06
Next Review
Quarterly
Owner
Tech Lead
Status
Active, living documentation