Quick Start Guide

Get Capcat running in 5 minutes with this streamlined setup guide.

Prerequisites

Python 3.8+
(recommended: 3.11+)
Virtual environment
capability
Network access
for downloading articles

Installation

macOS

pip install capcat
capcat list sources

Windows

pip install capcat
capcat list sources

Linux (Ubuntu / Debian)

Ubuntu and Debian restrict system-wide pip installs. Use pipx:

sudo apt install pipx
pipx install capcat
capcat list sources

Other Linux

pip install capcat
capcat list sources

Development Install (Contributing)

Clone the repo and install in editable mode:

git clone https://github.com/stayukasabov/capcat.git
cd capcat
pip install -e ".[dev]"
capcat list sources

Verification

# List available sources
capcat list sources

# List available bundles
capcat list bundles

Basic Usage

Interactive Mode (Recommended for New Users)

Launch the interactive menu for guided workflows:

./capcat catch

Main Menu Options:

  What would you like me to do?

  > Catch articles from a bundle of sources
    Catch articles from a list of sources
    Catch from a single source
    Catch a single article by URL
    Manage Sources (add/remove/configure)
    Exit

Why Use Interactive Mode:

No command memorization required
Guided step-by-step workflows
Visual source selection
Built-in source management
Prevents common errors
Ideal for daily use

Quick Examples:

Fetch a News Bundle:

./capcat catch
Select "Catch articles from a bundle"
Choose bundle (tech, news, science, etc.)
Select HTML generation option
Confirm and execute

Add a New Source:

./capcat catch
Select "Manage Sources"
Select "Add New Source from RSS Feed"
Enter RSS feed URL
Follow prompts

Test a Source:

./capcat catch
Select "Manage Sources"
Select "Test a Source"
Choose source to test
View results

For comprehensive interactive mode documentation, see Interactive Mode Guide.

CLI Mode (Advanced Users & Scripts)

Single Article Download

# Download a single article
./capcat single https://example.com/article

# With media files
./capcat single https://bbc.com/news/technology --media

Batch Downloads

# Tech news bundle (Hacker News + Lobsters + InfoQ)
./capcat bundle tech --count 10

# General news bundle (BBC + CNN + Reuters)
./capcat bundle news --count 15 --media

# Specific sources
./capcat fetch hn,bbc --count 20

Discovery Commands

# List all available sources
./capcat list sources

# List predefined bundles
./capcat list bundles

File Logging

# Save detailed logs to file (includes all debug information)
./capcat -L capcat.log bundle tech --count 10

# Verbose console output + file logging
./capcat -V -L debug.log fetch hn --count 15

# Timestamped log files
./capcat -L logs/news-$(date +%Y%m%d-%H%M%S).log bundle news --count 10

Output Structure

../News/news_DD-MM-YYYY/ # Batch downloads ├── Hacker-News_DD-MM-YYYY/ │ └── 01_Article_Title/ │ ├── article.md │ ├── comments.md │ └── images/ └── BBC-News_DD-MM-YYYY/ └── 01_Article_Title/ ├── article.md └── images/ ../Capcats/ # Single articles └── cc_DD-MM-YYYY-Article-Title/ ├── article.md └── images/

Key Features Demo

1. Config-Driven Sources (Simple)

# These sources use YAML configuration (no coding required)
./capcat fetch iq,euronews,straitstimes --count 5

2. Custom Sources (Complex)

# These sources have custom Python implementations
./capcat fetch hn,bbc,techcrunch --count 5

3. Media Handling

# Images only (default)
./capcat bundle tech --count 5

# All media types (images + videos + documents)
./capcat bundle tech --count 5 --media

Verification

Test System Health

# Run comprehensive source test
python test_comprehensive_sources.py

# Quick individual test
./capcat fetch hn --count 3

Expected Results

Sources
16+ sources discovered
Success Rate
~90% (14-16/16+ sources working)
Performance
4-6 seconds average per source
Output
Clean Markdown files with local images

Source Policy

Paywall Exclusion
Sources with paywalls or subscription requirements are excluded
Recently Removed
Wired, The Verge (moved to paywall model)
Bot Protection
Sources with aggressive anti-bot measures are avoided
Use ./capcat list sources
to see current available sources

Intelligent Protection System

Capcat includes automatic protection against problematic sites:

Download Limits (Automatic)

Normal Articles
50 images, 20MB total
Suspicious Sites
10 images, 5MB total
High Risk Sites
5 images, 2MB total
Link Aggregators
0 images, 0MB (blocked)

Real-World Protection

# Example: consumed.today attempted to download 471 images (103MB)
# Automatically blocked: "LINK_AGGREGATOR detected"
# Protection saved: 103MB of unwanted downloads

Media Flag Behavior

# Without --media flag: Standard protection limits apply
./capcat single https://example.com/article

# With --media flag: Bypass limits for legitimate sites (up to 500MB)
./capcat single https://example.com/article --media

# Note: --media flag ignored for blocked aggregator sites

Wrapper System

Capcat uses a two-layer wrapper system for reliability:

Architecture

capcat
- Lightweight 9-line bash script (executable shortcut)
run_capcat.py
- Comprehensive Python wrapper (handles all logic)
capcat.py
- Main application code

Benefits

Automatic Environment Management
- No manual venv activation needed
Dependency Installation
- Automatically installs requirements.txt
Error Handling
- Clear messages for common issues
Cross-Platform
- Works on macOS, Linux, Windows

Entry Points

# Primary method (recommended)
./capcat command args

# Alternative method (direct Python)
python3 run_capcat.py command args

# Manual method (requires venv activation)
source venv/bin/activate && python capcat.py command args

Common Issues

1. Wrapper System Issues

# If bash wrapper fails, use Python wrapper directly
python3 run_capcat.py list sources

# Check wrapper system health
./capcat --help

2. Module Not Found

# Let wrapper handle dependencies automatically
./capcat list sources

# Or manually activate environment (advanced users)
source venv/bin/activate

3. Virtual Environment Issues

# Remove and recreate (wrapper will rebuild)
rm -rf venv
./capcat list sources

4. Network Errors

# Some sources may have anti-bot protection (normal)
# Success rate of 90% (14-16/25) is expected

Next Steps

Interactive Mode Guide
- Complete guide to interactive menu system
Source Management Menu
- Detailed source management operations
Architecture Overview
- Understand the system design
Source Development
- Create new sources
Configuration Guide
- Customize system behavior
Testing Guide
- Run comprehensive tests

Pro Tips

Use bundles
for related content: bundle tech, bundle news
Start small
with --count 5 to test new sources
Monitor performance
- check average processing times
Use --media sparingly
- significantly increases download time
Enable file logging
with -L logfile.log for troubleshooting and debugging

You're now ready to use Capcat! For advanced usage, continue to the Architecture Overview.

Quick Start Guide

Prerequisites

Python 3.8+

Virtual environment

Network access

Installation

macOS

Windows

Linux (Ubuntu / Debian)

Other Linux

Development Install (Contributing)

Verification

Basic Usage

Interactive Mode (Recommended for New Users)

Main Menu Options:

Why Use Interactive Mode:

Quick Examples:

Fetch a News Bundle:

Add a New Source:

Test a Source:

CLI Mode (Advanced Users & Scripts)

Single Article Download

Batch Downloads

Discovery Commands

File Logging

Output Structure

Key Features Demo

1. Config-Driven Sources (Simple)

2. Custom Sources (Complex)

3. Media Handling

Verification

Test System Health

Expected Results

Sources

Success Rate

Performance

Output

Source Policy

Paywall Exclusion

Recently Removed

Bot Protection

Use ./capcat list sources

Intelligent Protection System

Download Limits (Automatic)

Normal Articles

Suspicious Sites

High Risk Sites

Link Aggregators

Real-World Protection

Media Flag Behavior

Wrapper System

Architecture

capcat

run_capcat.py

capcat.py

Benefits

Automatic Environment Management

Dependency Installation

Error Handling

Cross-Platform

Entry Points

Common Issues

1. Wrapper System Issues

2. Module Not Found

3. Virtual Environment Issues

4. Network Errors

Next Steps

Interactive Mode Guide

Source Management Menu

Architecture Overview

Source Development

Configuration Guide

Testing Guide

Pro Tips

Use bundles

Start small

Monitor performance

Use --media sparingly

Enable file logging

Use `./capcat list sources`

`capcat`

`run_capcat.py`

`capcat.py`