Processing Pipeline
graph LR
subgraph "Input Stage"
URLs[Article URLs]
Config[Processing Config]
OutputDir[Output Directory]
end
subgraph "Fetch Stage"
HttpRequest[HTTP Request]
RateLimit[Rate Limiting]
SessionPool[Session Pooling]
RetryLogic[Retry Logic]
ResponseValidation[Response Validation]
end
subgraph "Parse Stage"
HTMLParser[HTML Parser]
ContentExtraction[Content Extraction]
MetadataExtraction[Metadata Extraction]
LinkProcessing[Link Processing]
CleanupHTML[HTML Cleanup]
end
subgraph "Media Stage (--media flag only)"
MediaDetection[Media Detection]
URLExtraction[URL Extraction]
TypeClassification[Type Classification]
MediaDownload[Media Download]
ImageProcessing[Image Processing]
FileOrganization[File Organization]
end
subgraph "Conversion Stage"
MarkdownConversion[Markdown Conversion]
LinkUpdating[Link Updating]
ImageEmbedding[Image Embedding]
ContentStructuring[Content Structuring]
MetadataInsertion[Metadata Insertion]
end
subgraph "HTML Generation Stage"
TemplateLoading[Template Loading]
ContentRendering[Content Rendering]
StyleApplication[Style Application]
NavigationGeneration[Navigation Generation]
AssetLinking[Asset Linking]
end
subgraph "Output Stage"
DirectoryCreation[Directory Creation]
FileWriting[File Writing]
PermissionSetting[Permission Setting]
ProgressTracking[Progress Tracking]
ErrorLogging[Error Logging]
end
%% Flow connections
URLs --> HttpRequest
Config --> RateLimit
OutputDir --> DirectoryCreation
HttpRequest --> RateLimit
RateLimit --> SessionPool
SessionPool --> RetryLogic
RetryLogic --> ResponseValidation
ResponseValidation --> HTMLParser
HTMLParser --> ContentExtraction
ContentExtraction --> MetadataExtraction
MetadataExtraction --> LinkProcessing
LinkProcessing --> CleanupHTML
CleanupHTML --> MediaDetection
MediaDetection --> URLExtraction
URLExtraction --> TypeClassification
TypeClassification --> MediaDownload
MediaDownload --> ImageProcessing
ImageProcessing --> FileOrganization
CleanupHTML --> MarkdownConversion
FileOrganization --> LinkUpdating
MarkdownConversion --> LinkUpdating
LinkUpdating --> ImageEmbedding
ImageEmbedding --> ContentStructuring
ContentStructuring --> MetadataInsertion
MetadataInsertion --> TemplateLoading
TemplateLoading --> ContentRendering
ContentRendering --> StyleApplication
StyleApplication --> NavigationGeneration
NavigationGeneration --> AssetLinking
MetadataInsertion --> DirectoryCreation
AssetLinking --> FileWriting
DirectoryCreation --> FileWriting
FileWriting --> PermissionSetting
PermissionSetting --> ProgressTracking
%% Error handling
RetryLogic -.-> ErrorLogging
MediaDownload -.-> ErrorLogging
FileWriting -.-> ErrorLogging
%% Styling
classDef input fill:#e3f2fd
classDef fetch fill:#e8f5e8
classDef parse fill:#fff3e0
classDef media fill:#fce4ec
classDef convert fill:#f3e5f5
classDef html fill:#f1f8e9
classDef output fill:#ffe0b2
class URLs,Config,OutputDir input
class HttpRequest,RateLimit,SessionPool,RetryLogic,ResponseValidation fetch
class HTMLParser,ContentExtraction,MetadataExtraction,LinkProcessing,CleanupHTML parse
class MediaDetection,URLExtraction,TypeClassification,MediaDownload,ImageProcessing,FileOrganization media
class MarkdownConversion,LinkUpdating,ImageEmbedding,ContentStructuring,MetadataInsertion convert
class TemplateLoading,ContentRendering,StyleApplication,NavigationGeneration,AssetLinking html
class DirectoryCreation,FileWriting,PermissionSetting,ProgressTracking,ErrorLogging output
Parallel Processing
- Article Fetching: Up to 8 concurrent requests
- Media Download: Parallel image/media processing
- File Operations: Concurrent file writing
Error Handling
- Network Errors: Exponential backoff retry
- Parse Errors: Graceful degradation
- File Errors: Alternative path resolution
Resource Management
- Memory: Streaming for large files
- Network: Connection pooling and reuse
- Disk: Efficient directory structures
Quality Controls
- Content Validation: HTML structure verification
- Media Validation: File type and size checks
- Output Validation: Markdown syntax verification