capcat.core.streamlined_comment_processor
File: Application/capcat/core/streamlined_comment_processor.py
Description
Streamlined comment processor for optimizing nested structure handling and reducing conversion time. Designed to flatten complex comment hierarchies and provide inline comment display.
Classes
StreamlinedCommentProcessor
Comment processor that extracts comments with optional nesting depth preservation.
Methods
init
def __init__(self, max_comments: int = 100, max_links_per_comment: int = 5)
Parameters:
selfmax_comments(int) optionalmax_links_per_comment(int) optional
process_comments_flattened
def process_comments_flattened(self, soup: BeautifulSoup, comment_selector: str, user_selector: str = '.hnuser', comment_text_selector: str = '.comment', depth_fn: Optional[Callable[[Any], int]] = None, comment_permalink_fn: Optional[Callable[[str], str]] = None) -> List[Dict[str, Any]]
Process comments preserving nesting depth.
Args: soup: BeautifulSoup object of the comments page comment_selector: CSS selector for comment elements user_selector: CSS selector for user information comment_text_selector: CSS selector for comment text depth_fn: Optional callable(element) -> int returning nesting depth. If None, all comments get level=0. comment_permalink_fn: Optional callable(comment_id: str) -> str that generates a direct comment URL from the comment element’s id attribute. No username is passed or stored. If None, user_link falls back to ‘#’.
Returns: List of comment dicts with ‘level’ field reflecting nesting depth. ‘user’ is always ‘Anonymous’. ‘user_link’ is a comment permalink (no username stored anywhere in the output).
Parameters:
selfsoup(BeautifulSoup)comment_selector(str)user_selector(str) optionalcomment_text_selector(str) optionaldepth_fn(Optional[Callable[[Any], int]]) optionalcomment_permalink_fn(Optional[Callable[[str], str]]) optional
Returns: List[Dict[str, Any]]
_extract_comment_data_fast
def _extract_comment_data_fast(self, comment_elem, user_selector: str, comment_text_selector: str, index: int, depth_fn: Optional[Callable[[Any], int]] = None, comment_permalink_fn: Optional[Callable[[str], str]] = None) -> Optional[Dict[str, Any]]
Fast comment data extraction without deep processing. Username is never read or stored. user_link is derived from the comment element’s id attribute via comment_permalink_fn so readers can validate the comment on the source site without capcat retaining personal data. Falls back to ‘#’ when no permalink function is provided.
Parameters:
selfcomment_elemuser_selector(str)comment_text_selector(str)index(int)depth_fn(Optional[Callable[[Any], int]]) optionalcomment_permalink_fn(Optional[Callable[[str], str]]) optional
Returns: Optional[Dict[str, Any]]
_process_comment_text_streamlined
def _process_comment_text_streamlined(self, comment_elem) -> str
Streamlined comment text processing with minimal link handling.
Parameters:
selfcomment_elem
Returns: str
⚠️ High complexity: 21
generate_inline_comments_markdown
def generate_inline_comments_markdown(self, comments: List[Dict[str, Any]], article_title: str, comment_url: str, article_folder_path: str = None, link_text: str = 'comment') -> str
Generate inline comments markdown with flattened structure.
If article_folder_path is provided, prepends and appends a ← [[article_stem|Article]] wikilink for Obsidian graph connectivity.
Parameters:
selfcomments(List[Dict[str, Any]])article_title(str)comment_url(str)article_folder_path(str) optionallink_text(str) optional
Returns: str
generate_inline_comments_html
def generate_inline_comments_html(self, comments: List[Dict[str, Any]], article_title: str, comment_url: str, link_text: str = 'comment') -> str
Generate inline comments HTML directly, skipping markdown conversion. Optimized for HTML post-processor performance.
Parameters:
selfcomments(List[Dict[str, Any]])article_title(str)comment_url(str)link_text(str) optional
Returns: str
get_performance_metrics
def get_performance_metrics(self) -> Dict[str, Any]
Get performance metrics for monitoring.
Parameters:
self
Returns: Dict[str, Any]
Functions
create_optimized_comment_processor
def create_optimized_comment_processor(max_comments: int = 100) -> StreamlinedCommentProcessor
Factory function to create optimized comment processor.
Args: max_comments: Maximum number of comments to process
Returns: Configured StreamlinedCommentProcessor instance
Parameters:
max_comments(int) optional
Returns: StreamlinedCommentProcessor
_direct_text
def _direct_text(elem)
Get only the immediate text of an element, not from nested <p> children.
Parameters:
elem