System Architecture Diagrams

Complete System Architecture

graph TB subgraph User Interface Layer CLI[CLI Interfacecli.py] Interactive[Interactive Modeinteractive.py] Wrapper[Bash Wrappercapcat] end subgraph Core Orchestration Layer Main[Main Applicationcapcat.py] Config[Configurationconfig.py] Logging[Logging Systemlogging_config.py] Shutdown[Graceful Shutdownshutdown.py] end subgraph Source System Layer Registry[Source RegistryDiscovery & Management] Factory[Source FactoryInstantiation] Monitor[Performance MonitorMetrics & Health] end subgraph Hybrid Source Implementation ConfigDriven[Config-Driven SourcesYAML-based] Custom[Custom SourcesPython classes] BaseSource[Base SourceAbstract interface] end subgraph Shared Infrastructure SessionPool[Session PoolConnection reuse] ArticleFetcher[Article FetcherContent processing] MediaProcessor[Media ProcessorImage/video handling] HTMLConverter[HTML ConverterMarkdown generation] end subgraph Output Layer FileWriter[File WriterMarkdown output] HTMLGen[HTML GeneratorWeb view] MediaDownload[Media DownloaderFile management] end CLI --> Main Interactive --> Main Wrapper --> CLI Main --> Config Main --> Logging Main --> Shutdown Main --> Registry Registry --> Factory Registry --> Monitor Factory --> ConfigDriven Factory --> Custom ConfigDriven --> BaseSource Custom --> BaseSource BaseSource --> SessionPool BaseSource --> ArticleFetcher ArticleFetcher --> MediaProcessor ArticleFetcher --> HTMLConverter HTMLConverter --> FileWriter FileWriter --> HTMLGen MediaProcessor --> MediaDownload style Main fill:#d75f00,stroke:#333,stroke-width:4px,color:#fff style Registry fill:#4ecdc4,stroke:#333,stroke-width:2px style Factory fill:#4ecdc4,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Data Flow Architecture

flowchart TD User[User Command] --> Parse[CLI Parser] Parse --> Validate[Input Validation] Validate --> Registry[Source Registry] Registry --> Discover[Discover Sources] Discover --> ConfigSources[Config YAML Files] Discover --> CustomSources[Custom Python Classes] ConfigSources --> Factory CustomSources --> Factory Factory[Source Factory] --> Instantiate[Create Source Instances] Instantiate --> Pool[Shared Session Pool] Pool --> Parallel[Parallel Execution] Parallel --> S1[Source 1get_articles] Parallel --> S2[Source 2get_articles] Parallel --> S3[Source Nget_articles] S1 --> Articles1[Articles List] S2 --> Articles2[Articles List] S3 --> Articles3[Articles List] Articles1 --> Process[Processing Pipeline] Articles2 --> Process Articles3 --> Process Process --> Fetch[Fetch Content] Fetch --> Extract[Extract Text] Extract --> Media[Process Media] Media --> Convert[Convert to Markdown] Convert --> Output[Output Generation] Output --> Structure[Create Folder Structure] Structure --> Write[Write Files] Write --> Generate[Generate HTML Optional] Generate --> Complete[Operation Complete] Complete --> Report[Display Summary] style Parse fill:#ffa500,stroke:#333,stroke-width:2px style Factory fill:#4ecdc4,stroke:#333,stroke-width:2px style Parallel fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff style Process fill:#4ecdc4,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Hybrid Source System

graph LR subgraph Source Discovery A[sources/active/] --> B[config_driven/] A --> C[custom/] end B --> B1[configs/*.yaml] C --> C1[*/source.py] B1 --> D[Config-Driven Source] C1 --> E[Custom Source] D --> F[Base Source Interface] E --> F F --> G{get_articles method} G --> H[Returns Article objects] D --> D1[Simple Setup15-30 min] D --> D2[YAML configuration] D --> D3[BeautifulSoup] D --> D4[No Python coding] E --> E1[Complex Setup2-4 hours] E --> E2[Full Python control] E --> E3[API integration] E --> E4[Comment systems] style D fill:#4ecdc4,stroke:#333,stroke-width:2px style E fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff style F fill:#ffa500,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Design Patterns Applied

mindmap root((DesignPatterns)) Factory Pattern Source creation Unified interface Type abstraction Instance management Registry Pattern Auto-discovery Source catalog Validation Lookup service Strategy Pattern Content extraction Multiple algorithms Pluggable extractors Fallback options Observer Pattern Progress tracking Event notification UI updates Logging hooks Session Pooling Connection reuse Performance boost Resource sharing HTTP optimization Singleton Registry instance Config manager Session pool Logger

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Processing Pipeline Detailed

sequenceDiagram participant U as User participant M as Main App participant R as Registry participant F as Factory participant S as Source participant A as Article Fetcher participant MP as Media Processor participant FW as File Writer U->>M: ./capcat fetch hn --count 10 M->>R: get_source('hn') R->>R: Lookup source config R->>F: create_source(config) F->>S: new HackerNewsSource(config, session) F-->>M: Source instance M->>S: get_articles(count=10) activate S S->>S: Fetch article URLs S->>S: For each article: S->>A: fetch_content(url) activate A A->>A: HTTP request A->>A: Parse HTML A->>A: Extract content A-->>S: Article content deactivate A S->>S: Fetch comments (if applicable) S-->>M: List of 10 Articles deactivate S M->>M: For each article: M->>MP: process_media(article) activate MP MP->>MP: Find image URLs MP->>MP: Download images MP->>MP: Convert to local paths MP-->>M: Updated article deactivate MP M->>FW: write_article(article, path) activate FW FW->>FW: Create folder structure FW->>FW: Write article.md FW->>FW: Copy media files FW-->>M: Success deactivate FW M-->>U: [OK] 10 articles saved

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Error Handling Hierarchy

graph TD A[CapcatErrorBase Exception] --> B[ConfigurationError] A --> C[SourceError] A --> D[NetworkError] A --> E[ProcessingError] A --> F[FileSystemError] B --> B1[InvalidConfigError] B --> B2[MissingConfigError] C --> C1[SourceNotFoundError] C --> C2[SourceUnavailableError] C --> C3[ArticleFetchError] D --> D1[ConnectionError] D --> D2[TimeoutError] D --> D3[DNSError] E --> E1[ParseError] E --> E2[ConversionError] E --> E3[MediaDownloadError] F --> F1[PermissionError] F --> F2[DiskFullError] F --> F3[PathError] style A fill:#d75f00,stroke:#333,stroke-width:3px,color:#fff style B fill:#ffa500,stroke:#333,stroke-width:2px style C fill:#ffa500,stroke:#333,stroke-width:2px style D fill:#ffa500,stroke:#333,stroke-width:2px style E fill:#ffa500,stroke:#333,stroke-width:2px style F fill:#ffa500,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Performance Optimization Strategy

graph TB subgraph Optimization Techniques A1[Parallel Processing] A2[Connection Pooling] A3[Lazy Loading] A4[Caching] end A1 --> B1[ThreadPoolExecutor] A1 --> B2[Concurrent article fetching] A1 --> B3[5x faster than sequential] A2 --> C1[requests.Session reuse] A2 --> C2[HTTP keep-alive] A2 --> C3[70% time reduction] A3 --> D1[Content on demand] A3 --> D2[Comments when needed] A3 --> D3[Memory efficient] A4 --> E1[Config file caching] A4 --> E2[@lru_cache decorator] A4 --> E3[Avoid redundant I/O] B1 --> F[Performance Gains] C1 --> F D1 --> F E1 --> F F --> G[50-70% FasterOverall] style F fill:#4ecdc4,stroke:#333,stroke-width:2px style G fill:#d75f00,stroke:#333,stroke-width:4px,color:#fff

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Module Dependencies

graph TD subgraph External Dependencies E1[requestsHTTP client] E2[BeautifulSoupHTML parsing] E3[markdownifyHTML→Markdown] E4[questionaryInteractive UI] E5[PyYAMLConfig parsing] end subgraph Core Modules C1[capcat.pyMain app] C2[cli.pyCLI interface] C3[interactive.pyInteractive mode] end subgraph Source System S1[source_registry.py] S2[source_factory.py] S3[base_source.py] S4[config_driven_source.py] end subgraph Processing P1[article_fetcher.py] P2[unified_media_processor.py] P3[html_converter.py] end E1 --> P1 E2 --> P1 E2 --> S4 E3 --> P3 E4 --> C3 E5 --> S1 C1 --> C2 C1 --> C3 C1 --> S1 S1 --> S2 S2 --> S3 S2 --> S4 S3 --> P1 P1 --> P2 P1 --> P3 style C1 fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Configuration System

graph LR subgraph Priority Levels A[Command Line ArgsHighest Priority] B[Environment Variables] C[Config Filecapcat.yml] D[Default ValuesLowest Priority] end A --> E{Merge Configuration} B --> E C --> E D --> E E --> F[Final Config Object] F --> G[Used by Application] G --> G1[count: 30] G --> G2[output_dir: ../News/] G --> G3[media: false] G --> G4[html: false] style A fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff style F fill:#4ecdc4,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Security Architecture

graph TD subgraph Input Layer A1[User Input] A2[CLI Arguments] A3[Config Files] end A1 --> B[Validation Layer] A2 --> B A3 --> B B --> B1[Type Validation] B --> B2[Range Checking] B --> B3[Path Sanitization] B --> B4[URL Validation] B1 --> C{Valid?} B2 --> C B3 --> C B4 --> C C -->|No| D[Reject with Error] C -->|Yes| E[Processing Layer] E --> E1[Content Fetching] E --> E2[Media Download] E --> E3[File Writing] E1 --> F[Privacy Layer] E2 --> F E3 --> F F --> F1[Username Anonymization] F --> F2[No Telemetry] F --> F3[Local-only Storage] F1 --> G[Safe Output] F2 --> G F3 --> G style B fill:#ffa500,stroke:#333,stroke-width:2px style F fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff style G fill:#4ecdc4,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Component Communication

graph TB subgraph Component A - CLI A1[Parse Arguments] A2[Validate Input] A3[Route Command] end subgraph Component B - Registry B1[Discover Sources] B2[Validate Configs] B3[Provide Instances] end subgraph Component C - Factory C1[Create Source] C2[Inject Dependencies] C3[Return Instance] end subgraph Component D - Source D1[Fetch Articles] D2[Extract Content] D3[Return Data] end subgraph Component E - Output E1[Generate Structure] E2[Write Files] E3[Report Success] end A3 -->|source_id| B3 B3 -->|config| C1 C1 -->|session| C2 C2 -->|source| D1 D1 -->|articles| E1 E1 -->|paths| E2 E2 -->|summary| E3 E3 -->|result| A1 style A1 fill:#ffa500,stroke:#333,stroke-width:2px style B3 fill:#4ecdc4,stroke:#333,stroke-width:2px style C2 fill:#4ecdc4,stroke:#333,stroke-width:2px style D1 fill:#d75f00,stroke:#333,stroke-width:2px,color:#fff

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Scalability Architecture

graph LR subgraph Current Scale A1[17 Sources] A2[100s Articles/batch] A3[Single Machine] end subgraph Optimization Points B1[Parallel ProcessingThreadPool] B2[Connection ReuseSession Pool] B3[Efficient I/OBulk writes] end subgraph Future Scale Options C1[50+ Sources] C2[1000s Articles/batch] C3[Distributed Processing] end A1 --> B1 A2 --> B2 A3 --> B3 B1 --> C1 B2 --> C2 B3 --> C3 C1 --> D[Horizontal Scaling] C2 --> D C3 --> D D --> E1[Multi-machine] D --> E2[Queue-based] D --> E3[Cloud deployment] style B1 fill:#4ecdc4,stroke:#333,stroke-width:2px style B2 fill:#4ecdc4,stroke:#333,stroke-width:2px style D fill:#ffa500,stroke:#333,stroke-width:2px

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Architecture Evolution

timeline title Architecture Evolution 2020-2021 : Initial Architecture Single source support Synchronous processing No abstraction 2021-2022 : Multi-source Architecture Multiple sources added Parallel processing Basic source interface 2022-2023 : Registry Pattern Auto-discovery Dynamic loading Validation system 2023-2024 : Hybrid Architecture Config-driven sources Custom sources Factory pattern 2024-2025 : Current Architecture Performance monitor Advanced error handling Enterprise features

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.

Technology Stack

graph TB subgraph Core Technologies A1[Python 3.8+] A2[Standard Library] end subgraph HTTP & Networking B1[requests] B2[urllib3] end subgraph HTML & Content C1[BeautifulSoup4] C2[lxml] C3[markdownify] end subgraph CLI & UI D1[argparse] D2[questionary] D3[prompt_toolkit] end subgraph Configuration E1[PyYAML] E2[python-dotenv] end subgraph Testing F1[pytest] F2[pytest-cov] F3[pytest-mock] end A1 --> A2 A2 --> B1 B1 --> C1 C1 --> D1 D1 --> E1 style A1 fill:#d75f00,stroke:#333,stroke-width:3px,color:#fff

Use the free software Draw.io to clearly visualize the mermaid diagrams.

Copy the mermaid code and from the drop-down menus select

Arrange → Insert → Advanced → Mermaid

.