Data Flow Diagram
flowchart TD
Start([User Command]) --> Parse[Parse CLI Arguments]
Parse --> Validate[Validate Configuration]
Validate --> LoadSources[Load Source Configurations]
LoadSources --> SourceType{Source Type?}
SourceType -->|Config-Driven| LoadYAML[Load YAML Config]
SourceType -->|Custom| LoadPython[Load Python Module]
LoadYAML --> CreateSource1[Create Config-Driven Source]
LoadPython --> CreateSource2[Create Custom Source]
CreateSource1 --> FetchArticles[Fetch Article List]
CreateSource2 --> FetchArticles
FetchArticles --> ProcessParallel{Process in Parallel}
ProcessParallel --> FetchContent[Fetch Article Content]
FetchContent --> ExtractMedia[Extract Media URLs]
ExtractMedia --> DownloadMedia[Download Media Files]
DownloadMedia --> ProcessImages[Process Images]
ProcessImages --> ConvertHTML[Convert HTML to Markdown]
ConvertHTML --> GenerateHTML[Generate HTML Output]
GenerateHTML --> SaveFiles[Save to File System]
SaveFiles --> UpdateProgress[Update Progress]
UpdateProgress --> CheckComplete{All Articles Done?}
CheckComplete -->|No| ProcessParallel
CheckComplete -->|Yes| Complete[Complete]
%% Error handling
FetchContent --> Error{Error?}
Error -->|Yes| LogError[Log Error]
Error -->|No| ExtractMedia
LogError --> CheckComplete
%% Styling
classDef startEnd fill:#4caf50,color:#fff
classDef process fill:#2196f3,color:#fff
classDef decision fill:#ff9800,color:#fff
classDef error fill:#f44336,color:#fff
class Start,Complete startEnd
class Parse,Validate,LoadSources,LoadYAML,LoadPython,CreateSource1,CreateSource2,FetchArticles,FetchContent,ExtractMedia,DownloadMedia,ProcessImages,ConvertHTML,GenerateHTML,SaveFiles,UpdateProgress process
class SourceType,ProcessParallel,CheckComplete,Error decision
class LogError error
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menu select: Arrange → Insert → Advanced → Mermaid.
Data Transformations
1. Input Processing
CLI Arguments
→
Configuration Object
Source Names
→
Source Instances
URLs
→
Article Metadata
2. Content Processing
HTML Content
→
Cleaned HTML
Cleaned HTML
→
Markdown Text
Media URLs
→
Local File Paths
3. Output Generation
Article Data
→
Markdown Files
Media Content
→
Organized File Structure
Article + Metadata
→
HTML Pages
4. Error Handling
Network Errors
→
Retry Logic
Parse Errors
→
Fallback Processing
File Errors
→
Alternative Paths