System Architecture Diagrams — v1.9

Complete System Architecture

graph TB subgraph User Interface Layer CLI[CLI Interfacecli.py] TUI[Interactive TUIcapcat catch] end subgraph Vault Layer — user-owned CY[capcat.ymlsource list + counts] GS[Global-settings.yamlnetwork · logging · PDF] SC[sources/active/per-source configs] end subgraph Source System Layer Registry[Source RegistryDiscovery & Management] Factory[Source FactoryInstantiation] Monitor[Performance MonitorMetrics & Health] end subgraph Hybrid Source Implementation ConfigDriven[Config-Driven SourcesYAML-based] Custom[Custom SourcesPython classes] BaseSource[Base SourceAbstract interface] end subgraph Processing Pipeline SessionPool[Session PoolConnection reuse] ArticleFetcher[Article FetcherContent processing] MediaProcessor[Media ProcessorImage/video handling] HTMLConverter[HTML ConverterMarkdown generation] PDFManager[AsyncPDFManagerBackground thread] end subgraph Output Layer FileWriter[File WriterMarkdown + frontmatter] HTMLGen[HTML GeneratorWeb view] MediaDownload[Media + PDFs] end CLI --> CY TUI --> CY CY --> Registry GS --> Registry SC --> Registry Registry --> Factory Registry --> Monitor Factory --> ConfigDriven Factory --> Custom ConfigDriven --> BaseSource Custom --> BaseSource BaseSource --> SessionPool BaseSource --> ArticleFetcher ArticleFetcher --> MediaProcessor ArticleFetcher --> HTMLConverter ArticleFetcher --> PDFManager HTMLConverter --> FileWriter FileWriter --> HTMLGen MediaProcessor --> MediaDownload PDFManager --> MediaDownload style CLI fill:#d75f00,stroke:#333,stroke-width:4px,color:#fff style TUI fill:#d75f00,stroke:#333,stroke-width:4px,color:#fff style CY fill:#ffa500,stroke:#333,stroke-width:2px style GS fill:#ffa500,stroke:#333,stroke-width:2px style SC fill:#ffa500,stroke:#333,stroke-width:2px style Registry fill:#4ecdc4,stroke:#333,stroke-width:2px style Factory fill:#4ecdc4,stroke:#333,stroke-width:2px style PDFManager fill:#4ecdc4,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Data Flow Architecture

flowchart TD User[User Command] --> Parse[CLI Parser] Parse --> Vault[Resolve Vault Config4-level hierarchy] Vault --> VL1[CLI flags — highest priority] Vault --> VL2[TUI prompt — session override] Vault --> VL3[Per-source config.yaml] Vault --> VL4[Global-settings.yaml — defaults] VL4 --> Registry[Source Registry] VL3 --> Registry VL2 --> Registry VL1 --> Registry Registry --> Factory[Source Factory] Factory --> Instantiate[Create Source Instances] Instantiate --> Pool[Shared Session Pool] Pool --> Parallel[Parallel Execution] Parallel --> S1[Source 1get_articles] Parallel --> S2[Source 2get_articles] Parallel --> S3[Source Nget_articles] S1 --> Process[Processing Pipeline] S2 --> Process S3 --> Process Process --> Fetch[Fetch Content] Fetch --> Extract[Extract Text] Extract --> Media[Process Media] Media --> Convert[Convert to Markdown + frontmatter] Convert --> PDFCheck{PDF URLs detected?} PDFCheck -->|Yes + download_pdfs| AsyncPDF[AsyncPDFManagerBackground thread] PDFCheck -->|No| Output[Output Generation] AsyncPDF --> Output Output --> Structure[Create Folder Structure] Structure --> Write[Write Files] Write --> Complete[Operation Complete] style Parse fill:#ffa500,stroke:#333,stroke-width:2px style Vault fill:#ffa500,stroke:#333,stroke-width:2px style Factory fill:#4ecdc4,stroke:#333,stroke-width:2px style Parallel fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff style Process fill:#4ecdc4,stroke:#333,stroke-width:2px style AsyncPDF fill:#4ecdc4,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Hybrid Source System

graph LR subgraph Source Discovery A[Config/sources/active/] --> B[config_driven/] A --> C[custom/] end B --> B1[configs/*.yaml] C --> C1[*/source.py] B1 --> D[Config-Driven Source] C1 --> E[Custom Source] D --> F[Base Source Interface] E --> F F --> G{get_articles method} G --> H[Returns Article objects] D --> D1[Simple Setup15-30 min] D --> D2[YAML configuration] D --> D3[BeautifulSoup] D --> D4[No Python coding] E --> E1[Complex Setup2-4 hours] E --> E2[Full Python control] E --> E3[API integration] E --> E4[Comment systems] style D fill:#4ecdc4,stroke:#333,stroke-width:2px style E fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff style F fill:#ffa500,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Design Patterns Applied

mindmap root((DesignPatterns)) Factory Pattern Source creation Unified interface Type abstraction Instance management Registry Pattern Auto-discovery Source catalog Validation Lookup service Strategy Pattern Content extraction Multiple algorithms Pluggable extractors Fallback options Observer Pattern Progress tracking Event notification UI updates Logging hooks Session Pooling Connection reuse Performance boost Resource sharing HTTP optimization Singleton Registry instance Config manager Session pool Logger

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Processing Pipeline Detailed

sequenceDiagram participant U as User participant M as Main App participant V as Vault Config participant R as Registry participant F as Factory participant S as Source participant A as Article Fetcher participant MP as Media Processor participant FW as File Writer U->>M: capcat fetch hn --count 10 M->>V: Resolve vault config V->>V: CLI flags → TUI → per-source → Global-settings.yaml V-->>M: Resolved config (count=10, download_pdfs=false) M->>R: get_source('hn') R->>R: Lookup source config R->>F: create_source(config) F->>S: new HackerNewsSource(config, session) F-->>M: Source instance M->>S: get_articles(count=10) activate S S->>S: Fetch article URLs S->>S: For each article: S->>A: fetch_content(url) activate A A->>A: HTTP request A->>A: Parse HTML A->>A: Extract content + comments A-->>S: Article with YAML frontmatter deactivate A S-->>M: List of 10 Articles deactivate S M->>M: For each article: M->>MP: process_media(article) activate MP MP->>MP: Find image URLs MP->>MP: Download images MP->>MP: Convert to local paths MP-->>M: Updated article deactivate MP M->>FW: write_article(article, vault_path) activate FW FW->>FW: Create News/news_DD-MM-YYYY/ folder FW->>FW: Write article.md with frontmatter FW->>FW: Write hn-comments-*.md with backlink FW-->>M: Success deactivate FW M-->>U: [OK] 10 articles saved to vault

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Error Handling Hierarchy

graph TD A[CapcatErrorBase Exception] --> B[ConfigurationError] A --> C[SourceError] A --> D[NetworkError] A --> E[ProcessingError] A --> F[FileSystemError] B --> B1[InvalidConfigError] B --> B2[MissingConfigError] C --> C1[SourceNotFoundError] C --> C2[SourceUnavailableError] C --> C3[ArticleFetchError] D --> D1[ConnectionError] D --> D2[TimeoutError] D --> D3[DNSError] E --> E1[ParseError] E --> E2[ConversionError] E --> E3[MediaDownloadError] F --> F1[PermissionError] F --> F2[DiskFullError] F --> F3[PathError] style A fill:#d75f00,stroke:#333,stroke-width:3px,color:#fff style B fill:#ffa500,stroke:#333,stroke-width:2px style C fill:#ffa500,stroke:#333,stroke-width:2px style D fill:#ffa500,stroke:#333,stroke-width:2px style E fill:#ffa500,stroke:#333,stroke-width:2px style F fill:#ffa500,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Performance Optimization Strategy

graph TB subgraph Optimization Techniques A1[Parallel Processing] A2[Connection Pooling] A3[Lazy Loading] A4[Caching] end A1 --> B1[ThreadPoolExecutor] A1 --> B2[Concurrent article fetching] A1 --> B3[5x faster than sequential] A2 --> C1[requests.Session reuse] A2 --> C2[HTTP keep-alive] A2 --> C3[70% time reduction] A3 --> D1[Content on demand] A3 --> D2[Comments when needed] A3 --> D3[Memory efficient] A4 --> E1[Config file caching] A4 --> E2[@lru_cache decorator] A4 --> E3[Avoid redundant I/O] B1 --> F[Performance Gains] C1 --> F D1 --> F E1 --> F F --> G[50-70% FasterOverall] style F fill:#4ecdc4,stroke:#333,stroke-width:2px style G fill:#d75f00,stroke:#333,stroke-width:4px,color:#fff

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Module Dependencies

graph TD subgraph External Dependencies E1[requestsHTTP client] E2[BeautifulSoupHTML parsing] E3[markdownifyHTML→Markdown] E4[questionaryInteractive UI] E5[PyYAMLConfig parsing] end subgraph Core Modules C1[capcat.pyMain app] C2[cli.pyCLI interface] C3[interactive.pyInteractive mode] end subgraph Source System S1[source_registry.py] S2[source_factory.py] S3[base_source.py] S4[config_driven_source.py] end subgraph Processing P1[article_fetcher.py] P2[unified_media_processor.py] P3[html_converter.py] P4[async_pdf_manager.py] end E1 --> P1 E2 --> P1 E2 --> S4 E3 --> P3 E4 --> C3 E5 --> S1 C1 --> C2 C1 --> C3 C1 --> S1 S1 --> S2 S2 --> S3 S2 --> S4 S3 --> P1 P1 --> P2 P1 --> P3 P1 --> P4 style C1 fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff style P4 fill:#4ecdc4,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Configuration System

graph LR subgraph Priority Levels — highest to lowest A[CLI Flag--pdfs --count 20 --mediaOverrides everything. Ephemeral.] B[TUI PromptInteractive select menuSession-level override.] C[Per-Source ConfigConfig/sources/active/hn/config.yamlPermanent per-source settings.] D[Global SettingsConfig/Global-settings.yamlVault-wide defaults.] end A -->|overrides| B B -->|overrides| C C -->|overrides| D D --> F[Resolved Config] F --> G[Used by Pipeline] G --> G1[count: 30] G --> G2[download_pdfs: true/false] G --> G3[max_pdf_size_mb: 50] G --> G4[html: false] style A fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff style B fill:#ffa500,stroke:#333,stroke-width:2px style F fill:#4ecdc4,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Security Architecture

graph TD subgraph Input Layer A1[User Input] A2[CLI Arguments] A3[Config Files] end A1 --> B[Validation Layer] A2 --> B A3 --> B B --> B1[Type Validation] B --> B2[Range Checking] B --> B3[Path Sanitization] B --> B4[URL Validation] B1 --> C{Valid?} B2 --> C B3 --> C B4 --> C C -->|No| D[Reject with Error] C -->|Yes| E[Processing Layer] E --> E1[Content Fetching] E --> E2[Media Download] E --> E3[File Writing] E1 --> F[Privacy Layer] E2 --> F E3 --> F F --> F1[Username Anonymization] F --> F2[No Telemetry] F --> F3[Local-only Storage] F1 --> G[Safe Output] F2 --> G F3 --> G style B fill:#ffa500,stroke:#333,stroke-width:2px style F fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff style G fill:#4ecdc4,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Component Communication

graph TB subgraph Component A - CLI A1[Parse Arguments] A2[Validate Input] A3[Route Command] end subgraph Component B - Registry B1[Discover Sources] B2[Validate Configs] B3[Provide Instances] end subgraph Component C - Factory C1[Create Source] C2[Inject Dependencies] C3[Return Instance] end subgraph Component D - Source D1[Fetch Articles] D2[Extract Content] D3[Return Data] end subgraph Component E - Output E1[Generate Structure] E2[Write Files] E3[Report Success] end A3 -->|source_id| B3 B3 -->|config| C1 C1 -->|session| C2 C2 -->|source| D1 D1 -->|articles| E1 E1 -->|paths| E2 E2 -->|summary| E3 E3 -->|result| A1 style A1 fill:#ffa500,stroke:#333,stroke-width:2px style B3 fill:#4ecdc4,stroke:#333,stroke-width:2px style C2 fill:#4ecdc4,stroke:#333,stroke-width:2px style D1 fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Scalability Architecture

graph LR subgraph Current Scale A1[17 Sources] A2[100s Articles/batch] A3[Single Machine] end subgraph Optimization Points B1[Parallel ProcessingThreadPool] B2[Connection ReuseSession Pool] B3[Efficient I/OBulk writes] end subgraph Future Scale Options C1[50+ Sources] C2[1000s Articles/batch] C3[Distributed Processing] end A1 --> B1 A2 --> B2 A3 --> B3 B1 --> C1 B2 --> C2 B3 --> C3 C1 --> D[Horizontal Scaling] C2 --> D C3 --> D D --> E1[Multi-machine] D --> E2[Queue-based] D --> E3[Cloud deployment] style B1 fill:#4ecdc4,stroke:#333,stroke-width:2px style B2 fill:#4ecdc4,stroke:#333,stroke-width:2px style D fill:#ffa500,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Architecture Evolution

timeline title Architecture Evolution v1.0 : Initial Architecture Single source support Synchronous processing Clone-and-run install Config mixed into app directory v1.3 : Multi-source Architecture Multiple sources added Parallel processing Basic source interface Registry pattern v1.6 : Hybrid Sources Config-driven sources Custom sources Factory pattern pipx install support v1.9 : Vault Architecture Tool separated from user data First run scaffolds vault 4-level config hierarchy AsyncPDFManager Source update ownership model YAML frontmatter + wikilinks

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Technology Stack

graph TB subgraph Core Technologies A1[Python 3.8+] A2[Standard Library] end subgraph HTTP & Networking B1[requests] B2[urllib3] end subgraph HTML & Content C1[BeautifulSoup4] C2[lxml] C3[markdownify] end subgraph CLI & UI D1[argparse] D2[questionary] D3[prompt_toolkit] end subgraph Configuration E1[PyYAML] E2[python-dotenv] end subgraph Testing F1[pytest] F2[pytest-cov] F3[pytest-mock] end A1 --> A2 A2 --> B1 B1 --> C1 C1 --> D1 D1 --> E1 style A1 fill:#d75f00,stroke:#333,stroke-width:3px,color:#fff

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.