Processing Pipeline
graph LR
subgraph "Input Stage"
URLs[Article URLs]
Config[Processing Config]
OutputDir[Output Directory]
end
subgraph "Fetch Stage"
HttpRequest[HTTP Request]
RateLimit[Rate Limiting]
SessionPool[Session Pooling]
RetryLogic[Retry Logic]
ResponseValidation[Response Validation]
end
subgraph "Parse Stage"
HTMLParser[HTML Parser]
ContentExtraction[Content Extraction]
MetadataExtraction[Metadata Extraction]
LinkProcessing[Link Processing]
CleanupHTML[HTML Cleanup]
end
subgraph "Media Stage"
MediaDetection[Media Detection]
URLExtraction[URL Extraction]
TypeClassification[Type Classification]
MediaDownload[Media Download]
ImageProcessing[Image Processing]
FileOrganization[File Organization]
end
subgraph "Conversion Stage"
MarkdownConversion[Markdown Conversion]
LinkUpdating[Link Updating]
ImageEmbedding[Image Embedding]
ContentStructuring[Content Structuring]
MetadataInsertion[Metadata Insertion]
end
subgraph "HTML Generation Stage"
TemplateLoading[Template Loading]
ContentRendering[Content Rendering]
StyleApplication[Style Application]
NavigationGeneration[Navigation Generation]
AssetLinking[Asset Linking]
end
subgraph "Output Stage"
DirectoryCreation[Directory Creation]
FileWriting[File Writing]
PermissionSetting[Permission Setting]
ProgressTracking[Progress Tracking]
ErrorLogging[Error Logging]
end
%% Flow connections
URLs --> HttpRequest
Config --> RateLimit
OutputDir --> DirectoryCreation
HttpRequest --> RateLimit
RateLimit --> SessionPool
SessionPool --> RetryLogic
RetryLogic --> ResponseValidation
ResponseValidation --> HTMLParser
HTMLParser --> ContentExtraction
ContentExtraction --> MetadataExtraction
MetadataExtraction --> LinkProcessing
LinkProcessing --> CleanupHTML
CleanupHTML --> MediaDetection
MediaDetection --> URLExtraction
URLExtraction --> TypeClassification
TypeClassification --> MediaDownload
MediaDownload --> ImageProcessing
ImageProcessing --> FileOrganization
CleanupHTML --> MarkdownConversion
FileOrganization --> LinkUpdating
MarkdownConversion --> LinkUpdating
LinkUpdating --> ImageEmbedding
ImageEmbedding --> ContentStructuring
ContentStructuring --> MetadataInsertion
MetadataInsertion --> TemplateLoading
TemplateLoading --> ContentRendering
ContentRendering --> StyleApplication
StyleApplication --> NavigationGeneration
NavigationGeneration --> AssetLinking
MetadataInsertion --> DirectoryCreation
AssetLinking --> FileWriting
DirectoryCreation --> FileWriting
FileWriting --> PermissionSetting
PermissionSetting --> ProgressTracking
%% Error handling
RetryLogic -.-> ErrorLogging
MediaDownload -.-> ErrorLogging
FileWriting -.-> ErrorLogging
%% Styling
classDef input fill:#e3f2fd
classDef fetch fill:#e8f5e8
classDef parse fill:#fff3e0
classDef media fill:#fce4ec
classDef convert fill:#f3e5f5
classDef html fill:#f1f8e9
classDef output fill:#ffe0b2
class URLs,Config,OutputDir input
class HttpRequest,RateLimit,SessionPool,RetryLogic,ResponseValidation fetch
class HTMLParser,ContentExtraction,MetadataExtraction,LinkProcessing,CleanupHTML parse
class MediaDetection,URLExtraction,TypeClassification,MediaDownload,ImageProcessing,FileOrganization media
class MarkdownConversion,LinkUpdating,ImageEmbedding,ContentStructuring,MetadataInsertion convert
class TemplateLoading,ContentRendering,StyleApplication,NavigationGeneration,AssetLinking html
class DirectoryCreation,FileWriting,PermissionSetting,ProgressTracking,ErrorLogging output
Use the free software Draw.io to clearly visualize the mermaid diagrams.
Copy the mermaid code and from the drop-down menu select: Arrange → Insert → Advanced → Mermaid.
Pipeline Performance Characteristics
Parallel Processing
Article Fetching
Up to 8 concurrent requests
Media Download
Parallel image/media processing
File Operations
Concurrent file writing
Error Handling
Network Errors
Exponential backoff retry
Parse Errors
Graceful degradation
File Errors
Alternative path resolution
Resource Management
Memory
Streaming for large files
Network
Connection pooling and reuse
Disk
Efficient directory structures
Quality Controls
Content Validation
HTML structure verification
Media Validation
File type and size checks
Output Validation
Markdown syntax verification