Control what Capcat collects and how it's organized.
# Fetch 10 articles
capcat fetch hn --count 10
# Fetch 50 articles
capcat bundle tech --count 50
Sources and bundles have defaults (usually 30).
Use defaults:
capcat fetch hn # Uses 30
Override:
capcat fetch hn --count 15 # Uses 15
Markdown is great for archiving, but HTML is better for reading:
Add --html flag:
capcat fetch hn --count 10 --html
capcat bundle tech --html
capcat single URL --html
With --html flag:
Open in browser:
open Article_Folder/html/article.html
Images are downloaded and embedded by default. No flag required:
capcat fetch hn --count 10
# Images automatically downloaded to images/
Disable image downloads globally in Config/Global-settings.yaml:
processing:
download_images: false
Use --pdfs to download PDF files linked in articles, independently of other media:
capcat fetch nature --count 10 --pdfs
PDFs are downloaded asynchronously in the background. A progress line appears only when the count changes:
Downloading PDFs: 3 active, 2 queued, 8 completed
pdf:
max_pdf_size_bytes: 20971520 # Skip PDFs larger than 20 MB
max_pdf_per_article: 10 # Cap per article
Use --media to download everything — images, PDFs, videos, audio, and documents:
capcat fetch nature --count 10 --media
--media is a superset of --pdfs. Using --media enables PDF downloads automatically.
Some sources (Nature, IEEE, Scientific American) have PDF-heavy content. Their default PDF behaviour
is configured in their config.yaml via a media: block:
media:
download_pdfs: true # Enable PDFs for this source by default
max_pdf_size_mb: 10 # Source-level size cap (overrides global)
The resolution order from highest to lowest priority:
--pdfs / --media / --no-pdfs)media: block in config.yamlGlobal-settings.yaml media.download_pdfsNews/news_DD-MM-YYYY/Source_DD-MM-YYYY/
Capcats/DD-MM-YYYY-Article-Title/
Use --output flag:
# Custom directory
capcat fetch hn --count 10 --output /path/to/output
# Relative path
capcat single URL --output ~/Articles
# Current directory
capcat fetch bbc --output .
Re-fetch and overwrite existing articles:
capcat fetch hn --count 10 --update
capcat bundle tech --update
capcat single URL --update
Global settings live in Config/Global-settings.yaml inside your vault. Generate it with:
capcat settings --force
Then edit directly:
vim Config/Global-settings.yaml
Common customizations:
processing:
max_workers: 16 # Parallel downloads (faster)
download_images: true # On by default
network:
connect_timeout: 15 # Connection timeout
read_timeout: 45 # Read timeout
logging:
console_level: "INFO" # Log verbosity (DEBUG, INFO, WARNING, ERROR)
Save and settings apply to all future fetches — no restart required.
capcat.yml at the vault root controls which sources run and how many articles each fetches:
sources:
- name: hn
article_count: 10
- name: bbc
article_count: 5
bundles: {}
Edit source config:
vim Config/sources/active/config_driven/configs/sourcename.yaml
Customize:
timeout: 15.0 # Source-specific timeout
rate_limit: 2.0 # Slower rate limiting
Fast, no frills:
capcat bundle tech --count 15
# No --html (faster)
# No --media (smaller)
# Default output location
Full featured:
capcat bundle science --count 40 --html --media
# HTML for browser reading
# Media for offline viewing
# Large count for weekend
Custom location, specific sources:
capcat fetch nature,scientificamerican --count 30 \
--html --media --output ~/Research/Climate
Refresh existing collection:
capcat bundle news --count 20 --update --html
# Updates existing articles
# Regenerates HTML
Save detailed logs:
capcat --log-file capcat.log fetch hn --count 10
Log includes:
capcat fetch hn --count 10
capcat --verbose fetch hn --count 10
capcat -V fetch hn --count 10
capcat --quiet fetch hn --count 10
capcat -q fetch hn --count 10
--output for topicscapcat bundle tech --count 10
# ~1-2 minutes
capcat bundle tech --count 20 --html
# ~3-5 minutes
capcat bundle tech --count 50 --html --media
# ~10-15 minutes
Override defaults without editing config:
# More parallel workers
export CAPCAT_PROCESSING_MAX_WORKERS=16
capcat bundle tech
# Custom timeout
export CAPCAT_NETWORK_CONNECT_TIMEOUT=20
capcat fetch bbc --count 30
# Article count
--count N
# Generate HTML
--html
# Download all media
--media
# Custom output
--output DIR
# Update existing
--update
# File logging
--log-file FILE
# Verbosity
--verbose / -V
--quiet / -q
# Combined example
capcat bundle tech --count 20 --html --output ~/News