System Architecture Diagrams — v1.9
Complete System Architecture
graph TB
subgraph User Interface Layer
CLI[CLI Interfacecli.py]
TUI[Interactive TUIcapcat catch]
end
subgraph Vault Layer — user-owned
CY[capcat.ymlsource list + counts]
GS[Global-settings.yamlnetwork · logging · PDF]
SC[sources/active/per-source configs]
end
subgraph Source System Layer
Registry[Source RegistryDiscovery & Management]
Factory[Source FactoryInstantiation]
Monitor[Performance MonitorMetrics & Health]
end
subgraph Hybrid Source Implementation
ConfigDriven[Config-Driven SourcesYAML-based]
Custom[Custom SourcesPython classes]
BaseSource[Base SourceAbstract interface]
end
subgraph Processing Pipeline
SessionPool[Session PoolConnection reuse]
ArticleFetcher[Article FetcherContent processing]
MediaProcessor[Media ProcessorImage/video handling]
HTMLConverter[HTML ConverterMarkdown generation]
PDFManager[AsyncPDFManagerBackground thread]
end
subgraph Output Layer
FileWriter[File WriterMarkdown + frontmatter]
HTMLGen[HTML GeneratorWeb view]
MediaDownload[Media + PDFs]
end
CLI --> CY
TUI --> CY
CY --> Registry
GS --> Registry
SC --> Registry
Registry --> Factory
Registry --> Monitor
Factory --> ConfigDriven
Factory --> Custom
ConfigDriven --> BaseSource
Custom --> BaseSource
BaseSource --> SessionPool
BaseSource --> ArticleFetcher
ArticleFetcher --> MediaProcessor
ArticleFetcher --> HTMLConverter
ArticleFetcher --> PDFManager
HTMLConverter --> FileWriter
FileWriter --> HTMLGen
MediaProcessor --> MediaDownload
PDFManager --> MediaDownload
style CLI fill:#d75f00,stroke:#333,stroke-width:4px,color:#fff
style TUI fill:#d75f00,stroke:#333,stroke-width:4px,color:#fff
style CY fill:#ffa500,stroke:#333,stroke-width:2px
style GS fill:#ffa500,stroke:#333,stroke-width:2px
style SC fill:#ffa500,stroke:#333,stroke-width:2px
style Registry fill:#4ecdc4,stroke:#333,stroke-width:2px
style Factory fill:#4ecdc4,stroke:#333,stroke-width:2px
style PDFManager fill:#4ecdc4,stroke:#333,stroke-width:2px
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Data Flow Architecture
flowchart TD
User[User Command] --> Parse[CLI Parser]
Parse --> Vault[Resolve Vault Config4-level hierarchy]
Vault --> VL1[CLI flags — highest priority]
Vault --> VL2[TUI prompt — session override]
Vault --> VL3[Per-source config.yaml]
Vault --> VL4[Global-settings.yaml — defaults]
VL4 --> Registry[Source Registry]
VL3 --> Registry
VL2 --> Registry
VL1 --> Registry
Registry --> Factory[Source Factory]
Factory --> Instantiate[Create Source Instances]
Instantiate --> Pool[Shared Session Pool]
Pool --> Parallel[Parallel Execution]
Parallel --> S1[Source 1get_articles]
Parallel --> S2[Source 2get_articles]
Parallel --> S3[Source Nget_articles]
S1 --> Process[Processing Pipeline]
S2 --> Process
S3 --> Process
Process --> Fetch[Fetch Content]
Fetch --> Extract[Extract Text]
Extract --> Media[Process Media]
Media --> Convert[Convert to Markdown + frontmatter]
Convert --> PDFCheck{PDF URLs detected?}
PDFCheck -->|Yes + download_pdfs| AsyncPDF[AsyncPDFManagerBackground thread]
PDFCheck -->|No| Output[Output Generation]
AsyncPDF --> Output
Output --> Structure[Create Folder Structure]
Structure --> Write[Write Files]
Write --> Complete[Operation Complete]
style Parse fill:#ffa500,stroke:#333,stroke-width:2px
style Vault fill:#ffa500,stroke:#333,stroke-width:2px
style Factory fill:#4ecdc4,stroke:#333,stroke-width:2px
style Parallel fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff
style Process fill:#4ecdc4,stroke:#333,stroke-width:2px
style AsyncPDF fill:#4ecdc4,stroke:#333,stroke-width:2px
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Hybrid Source System
graph LR
subgraph Source Discovery
A[Config/sources/active/] --> B[config_driven/]
A --> C[custom/]
end
B --> B1[configs/*.yaml]
C --> C1[*/source.py]
B1 --> D[Config-Driven Source]
C1 --> E[Custom Source]
D --> F[Base Source Interface]
E --> F
F --> G{get_articles method}
G --> H[Returns Article objects]
D --> D1[Simple Setup15-30 min]
D --> D2[YAML configuration]
D --> D3[BeautifulSoup]
D --> D4[No Python coding]
E --> E1[Complex Setup2-4 hours]
E --> E2[Full Python control]
E --> E3[API integration]
E --> E4[Comment systems]
style D fill:#4ecdc4,stroke:#333,stroke-width:2px
style E fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff
style F fill:#ffa500,stroke:#333,stroke-width:2px
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Design Patterns Applied
mindmap
root((DesignPatterns))
Factory Pattern
Source creation
Unified interface
Type abstraction
Instance management
Registry Pattern
Auto-discovery
Source catalog
Validation
Lookup service
Strategy Pattern
Content extraction
Multiple algorithms
Pluggable extractors
Fallback options
Observer Pattern
Progress tracking
Event notification
UI updates
Logging hooks
Session Pooling
Connection reuse
Performance boost
Resource sharing
HTTP optimization
Singleton
Registry instance
Config manager
Session pool
Logger
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Processing Pipeline Detailed
sequenceDiagram
participant U as User
participant M as Main App
participant V as Vault Config
participant R as Registry
participant F as Factory
participant S as Source
participant A as Article Fetcher
participant MP as Media Processor
participant FW as File Writer
U->>M: capcat fetch hn --count 10
M->>V: Resolve vault config
V->>V: CLI flags → TUI → per-source → Global-settings.yaml
V-->>M: Resolved config (count=10, download_pdfs=false)
M->>R: get_source('hn')
R->>R: Lookup source config
R->>F: create_source(config)
F->>S: new HackerNewsSource(config, session)
F-->>M: Source instance
M->>S: get_articles(count=10)
activate S
S->>S: Fetch article URLs
S->>S: For each article:
S->>A: fetch_content(url)
activate A
A->>A: HTTP request
A->>A: Parse HTML
A->>A: Extract content + comments
A-->>S: Article with YAML frontmatter
deactivate A
S-->>M: List of 10 Articles
deactivate S
M->>M: For each article:
M->>MP: process_media(article)
activate MP
MP->>MP: Find image URLs
MP->>MP: Download images
MP->>MP: Convert to local paths
MP-->>M: Updated article
deactivate MP
M->>FW: write_article(article, vault_path)
activate FW
FW->>FW: Create News/news_DD-MM-YYYY/ folder
FW->>FW: Write article.md with frontmatter
FW->>FW: Write hn-comments-*.md with backlink
FW-->>M: Success
deactivate FW
M-->>U: [OK] 10 articles saved to vault
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Error Handling Hierarchy
graph TD
A[CapcatErrorBase Exception] --> B[ConfigurationError]
A --> C[SourceError]
A --> D[NetworkError]
A --> E[ProcessingError]
A --> F[FileSystemError]
B --> B1[InvalidConfigError]
B --> B2[MissingConfigError]
C --> C1[SourceNotFoundError]
C --> C2[SourceUnavailableError]
C --> C3[ArticleFetchError]
D --> D1[ConnectionError]
D --> D2[TimeoutError]
D --> D3[DNSError]
E --> E1[ParseError]
E --> E2[ConversionError]
E --> E3[MediaDownloadError]
F --> F1[PermissionError]
F --> F2[DiskFullError]
F --> F3[PathError]
style A fill:#d75f00,stroke:#333,stroke-width:3px,color:#fff
style B fill:#ffa500,stroke:#333,stroke-width:2px
style C fill:#ffa500,stroke:#333,stroke-width:2px
style D fill:#ffa500,stroke:#333,stroke-width:2px
style E fill:#ffa500,stroke:#333,stroke-width:2px
style F fill:#ffa500,stroke:#333,stroke-width:2px
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Performance Optimization Strategy
graph TB
subgraph Optimization Techniques
A1[Parallel Processing]
A2[Connection Pooling]
A3[Lazy Loading]
A4[Caching]
end
A1 --> B1[ThreadPoolExecutor]
A1 --> B2[Concurrent article fetching]
A1 --> B3[5x faster than sequential]
A2 --> C1[requests.Session reuse]
A2 --> C2[HTTP keep-alive]
A2 --> C3[70% time reduction]
A3 --> D1[Content on demand]
A3 --> D2[Comments when needed]
A3 --> D3[Memory efficient]
A4 --> E1[Config file caching]
A4 --> E2[@lru_cache decorator]
A4 --> E3[Avoid redundant I/O]
B1 --> F[Performance Gains]
C1 --> F
D1 --> F
E1 --> F
F --> G[50-70% FasterOverall]
style F fill:#4ecdc4,stroke:#333,stroke-width:2px
style G fill:#d75f00,stroke:#333,stroke-width:4px,color:#fff
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Module Dependencies
graph TD
subgraph External Dependencies
E1[requestsHTTP client]
E2[BeautifulSoupHTML parsing]
E3[markdownifyHTML→Markdown]
E4[questionaryInteractive UI]
E5[PyYAMLConfig parsing]
end
subgraph Core Modules
C1[capcat.pyMain app]
C2[cli.pyCLI interface]
C3[interactive.pyInteractive mode]
end
subgraph Source System
S1[source_registry.py]
S2[source_factory.py]
S3[base_source.py]
S4[config_driven_source.py]
end
subgraph Processing
P1[article_fetcher.py]
P2[unified_media_processor.py]
P3[html_converter.py]
P4[async_pdf_manager.py]
end
E1 --> P1
E2 --> P1
E2 --> S4
E3 --> P3
E4 --> C3
E5 --> S1
C1 --> C2
C1 --> C3
C1 --> S1
S1 --> S2
S2 --> S3
S2 --> S4
S3 --> P1
P1 --> P2
P1 --> P3
P1 --> P4
style C1 fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff
style P4 fill:#4ecdc4,stroke:#333,stroke-width:2px
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Configuration System
graph LR
subgraph Priority Levels — highest to lowest
A[CLI Flag--pdfs --count 20 --mediaOverrides everything. Ephemeral.]
B[TUI PromptInteractive select menuSession-level override.]
C[Per-Source ConfigConfig/sources/active/hn/config.yamlPermanent per-source settings.]
D[Global SettingsConfig/Global-settings.yamlVault-wide defaults.]
end
A -->|overrides| B
B -->|overrides| C
C -->|overrides| D
D --> F[Resolved Config]
F --> G[Used by Pipeline]
G --> G1[count: 30]
G --> G2[download_pdfs: true/false]
G --> G3[max_pdf_size_mb: 50]
G --> G4[html: false]
style A fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff
style B fill:#ffa500,stroke:#333,stroke-width:2px
style F fill:#4ecdc4,stroke:#333,stroke-width:2px
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Security Architecture
graph TD
subgraph Input Layer
A1[User Input]
A2[CLI Arguments]
A3[Config Files]
end
A1 --> B[Validation Layer]
A2 --> B
A3 --> B
B --> B1[Type Validation]
B --> B2[Range Checking]
B --> B3[Path Sanitization]
B --> B4[URL Validation]
B1 --> C{Valid?}
B2 --> C
B3 --> C
B4 --> C
C -->|No| D[Reject with Error]
C -->|Yes| E[Processing Layer]
E --> E1[Content Fetching]
E --> E2[Media Download]
E --> E3[File Writing]
E1 --> F[Privacy Layer]
E2 --> F
E3 --> F
F --> F1[Username Anonymization]
F --> F2[No Telemetry]
F --> F3[Local-only Storage]
F1 --> G[Safe Output]
F2 --> G
F3 --> G
style B fill:#ffa500,stroke:#333,stroke-width:2px
style F fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff
style G fill:#4ecdc4,stroke:#333,stroke-width:2px
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Component Communication
graph TB
subgraph Component A - CLI
A1[Parse Arguments]
A2[Validate Input]
A3[Route Command]
end
subgraph Component B - Registry
B1[Discover Sources]
B2[Validate Configs]
B3[Provide Instances]
end
subgraph Component C - Factory
C1[Create Source]
C2[Inject Dependencies]
C3[Return Instance]
end
subgraph Component D - Source
D1[Fetch Articles]
D2[Extract Content]
D3[Return Data]
end
subgraph Component E - Output
E1[Generate Structure]
E2[Write Files]
E3[Report Success]
end
A3 -->|source_id| B3
B3 -->|config| C1
C1 -->|session| C2
C2 -->|source| D1
D1 -->|articles| E1
E1 -->|paths| E2
E2 -->|summary| E3
E3 -->|result| A1
style A1 fill:#ffa500,stroke:#333,stroke-width:2px
style B3 fill:#4ecdc4,stroke:#333,stroke-width:2px
style C2 fill:#4ecdc4,stroke:#333,stroke-width:2px
style D1 fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Scalability Architecture
graph LR
subgraph Current Scale
A1[17 Sources]
A2[100s Articles/batch]
A3[Single Machine]
end
subgraph Optimization Points
B1[Parallel ProcessingThreadPool]
B2[Connection ReuseSession Pool]
B3[Efficient I/OBulk writes]
end
subgraph Future Scale Options
C1[50+ Sources]
C2[1000s Articles/batch]
C3[Distributed Processing]
end
A1 --> B1
A2 --> B2
A3 --> B3
B1 --> C1
B2 --> C2
B3 --> C3
C1 --> D[Horizontal Scaling]
C2 --> D
C3 --> D
D --> E1[Multi-machine]
D --> E2[Queue-based]
D --> E3[Cloud deployment]
style B1 fill:#4ecdc4,stroke:#333,stroke-width:2px
style B2 fill:#4ecdc4,stroke:#333,stroke-width:2px
style D fill:#ffa500,stroke:#333,stroke-width:2px
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Architecture Evolution
timeline
title Architecture Evolution
v1.0 : Initial Architecture
Single source support
Synchronous processing
Clone-and-run install
Config mixed into app directory
v1.3 : Multi-source Architecture
Multiple sources added
Parallel processing
Basic source interface
Registry pattern
v1.6 : Hybrid Sources
Config-driven sources
Custom sources
Factory pattern
pipx install support
v1.9 : Vault Architecture
Tool separated from user data
First run scaffolds vault
4-level config hierarchy
AsyncPDFManager
Source update ownership model
YAML frontmatter + wikilinks
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.
Technology Stack
graph TB
subgraph Core Technologies
A1[Python 3.8+]
A2[Standard Library]
end
subgraph HTTP & Networking
B1[requests]
B2[urllib3]
end
subgraph HTML & Content
C1[BeautifulSoup4]
C2[lxml]
C3[markdownify]
end
subgraph CLI & UI
D1[argparse]
D2[questionary]
D3[prompt_toolkit]
end
subgraph Configuration
E1[PyYAML]
E2[python-dotenv]
end
subgraph Testing
F1[pytest]
F2[pytest-cov]
F3[pytest-mock]
end
A1 --> A2
A2 --> B1
B1 --> C1
C1 --> D1
D1 --> E1
style A1 fill:#d75f00,stroke:#333,stroke-width:3px,color:#fff
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menus select
Arrange → Insert → Advanced → Mermaid
.