Control what Capcat collects and how it's organized.
# Fetch 10 articles
./capcat fetch hn --count 10
# Fetch 50 articles
./capcat bundle tech --count 50
Sources and bundles have defaults (usually 30).
Use defaults:
./capcat fetch hn # Uses 30
Override:
./capcat fetch hn --count 15 # Uses 15
Markdown is great for archiving, but HTML is better for reading:
Add --html flag:
./capcat fetch hn --count 10 --html
./capcat bundle tech --html
./capcat single URL --html
With --html flag:
Open in browser:
open Article_Folder/html/article.html
Images are always downloaded and embedded:
./capcat fetch hn --count 10
# Images automatically downloaded
Use --media flag for additional files:
./capcat fetch nature --count 10 --media
Downloads:
--media significantly increases:
../News/news_DD-MM-YYYY/Source_DD-MM-YYYY/
../Capcats/cc_DD-MM-YYYY-Article-Title/
Use --output flag:
# Custom directory
./capcat fetch hn --count 10 --output /path/to/output
# Relative path
./capcat single URL --output ~/Articles
# Current directory
./capcat fetch bbc --output .
Re-fetch and overwrite existing articles:
./capcat fetch hn --count 10 --update
./capcat bundle tech --update
./capcat single URL --update
Edit capcat.yml:
vim capcat.yml
Common customizations:
processing:
max_workers: 16 # Parallel downloads (faster)
download_images: true # Always on
download_videos: false # Off unless --media
network:
connect_timeout: 15 # Connection timeout
read_timeout: 45 # Read timeout
logging:
default_level: "INFO" # Log verbosity
Save and settings apply to all future fetches.
Edit source config:
vim sources/active/config_driven/configs/sourcename.yaml
Customize:
timeout: 15.0 # Source-specific timeout
rate_limit: 2.0 # Slower rate limiting
Fast, no frills:
./capcat bundle tech --count 15
# No --html (faster)
# No --media (smaller)
# Default output location
Full featured:
./capcat bundle science --count 40 --html --media
# HTML for browser reading
# Media for offline viewing
# Large count for weekend
Custom location, specific sources:
./capcat fetch nature,scientificamerican --count 30 \
--html --media --output ~/Research/Climate
Refresh existing collection:
./capcat bundle news --count 20 --update --html
# Updates existing articles
# Regenerates HTML
Save detailed logs:
./capcat --log-file capcat.log fetch hn --count 10
Log includes:
./capcat fetch hn --count 10
./capcat --verbose fetch hn --count 10
./capcat -V fetch hn --count 10
./capcat --quiet fetch hn --count 10
./capcat -q fetch hn --count 10
--output for topics./capcat bundle tech --count 10
# ~1-2 minutes
./capcat bundle tech --count 20 --html
# ~3-5 minutes
./capcat bundle tech --count 50 --html --media
# ~10-15 minutes
Override defaults without editing config:
# More parallel workers
export CAPCAT_PROCESSING_MAX_WORKERS=16
./capcat bundle tech
# Custom timeout
export CAPCAT_NETWORK_CONNECT_TIMEOUT=20
./capcat fetch bbc --count 30
# Article count
--count N
# Generate HTML
--html
# Download all media
--media
# Custom output
--output DIR
# Update existing
--update
# File logging
--log-file FILE
# Verbosity
--verbose / -V
--quiet / -q
# Combined example
./capcat bundle tech --count 20 --html --output ~/News