capcat.core.streamlined_comment_processor

File: Application/capcat/core/streamlined_comment_processor.py

Description

Streamlined comment processor for optimizing nested structure handling and reducing conversion time. Designed to flatten complex comment hierarchies and provide inline comment display.

Classes

StreamlinedCommentProcessor

Comment processor that extracts comments with optional nesting depth preservation.

Methods

init
def __init__(self, max_comments: int = 100, max_links_per_comment: int = 5)

Parameters:

  • self
  • max_comments (int) optional
  • max_links_per_comment (int) optional
process_comments_flattened
def process_comments_flattened(self, soup: BeautifulSoup, comment_selector: str, user_selector: str = '.hnuser', comment_text_selector: str = '.comment', depth_fn: Optional[Callable[[Any], int]] = None, comment_permalink_fn: Optional[Callable[[str], str]] = None) -> List[Dict[str, Any]]

Process comments preserving nesting depth.

Args: soup: BeautifulSoup object of the comments page comment_selector: CSS selector for comment elements user_selector: CSS selector for user information comment_text_selector: CSS selector for comment text depth_fn: Optional callable(element) -> int returning nesting depth. If None, all comments get level=0. comment_permalink_fn: Optional callable(comment_id: str) -> str that generates a direct comment URL from the comment element’s id attribute. No username is passed or stored. If None, user_link falls back to ‘#’.

Returns: List of comment dicts with ‘level’ field reflecting nesting depth. ‘user’ is always ‘Anonymous’. ‘user_link’ is a comment permalink (no username stored anywhere in the output).

Parameters:

  • self
  • soup (BeautifulSoup)
  • comment_selector (str)
  • user_selector (str) optional
  • comment_text_selector (str) optional
  • depth_fn (Optional[Callable[[Any], int]]) optional
  • comment_permalink_fn (Optional[Callable[[str], str]]) optional

Returns: List[Dict[str, Any]]

_extract_comment_data_fast
def _extract_comment_data_fast(self, comment_elem, user_selector: str, comment_text_selector: str, index: int, depth_fn: Optional[Callable[[Any], int]] = None, comment_permalink_fn: Optional[Callable[[str], str]] = None) -> Optional[Dict[str, Any]]

Fast comment data extraction without deep processing. Username is never read or stored. user_link is derived from the comment element’s id attribute via comment_permalink_fn so readers can validate the comment on the source site without capcat retaining personal data. Falls back to ‘#’ when no permalink function is provided.

Parameters:

  • self
  • comment_elem
  • user_selector (str)
  • comment_text_selector (str)
  • index (int)
  • depth_fn (Optional[Callable[[Any], int]]) optional
  • comment_permalink_fn (Optional[Callable[[str], str]]) optional

Returns: Optional[Dict[str, Any]]

_process_comment_text_streamlined
def _process_comment_text_streamlined(self, comment_elem) -> str

Streamlined comment text processing with minimal link handling.

Parameters:

  • self
  • comment_elem

Returns: str

⚠️ High complexity: 21

generate_inline_comments_markdown
def generate_inline_comments_markdown(self, comments: List[Dict[str, Any]], article_title: str, comment_url: str, article_folder_path: str = None, link_text: str = 'comment') -> str

Generate inline comments markdown with flattened structure.

If article_folder_path is provided, prepends and appends a ← [[article_stem|Article]] wikilink for Obsidian graph connectivity.

Parameters:

  • self
  • comments (List[Dict[str, Any]])
  • article_title (str)
  • comment_url (str)
  • article_folder_path (str) optional
  • link_text (str) optional

Returns: str

generate_inline_comments_html
def generate_inline_comments_html(self, comments: List[Dict[str, Any]], article_title: str, comment_url: str, link_text: str = 'comment') -> str

Generate inline comments HTML directly, skipping markdown conversion. Optimized for HTML post-processor performance.

Parameters:

  • self
  • comments (List[Dict[str, Any]])
  • article_title (str)
  • comment_url (str)
  • link_text (str) optional

Returns: str

get_performance_metrics
def get_performance_metrics(self) -> Dict[str, Any]

Get performance metrics for monitoring.

Parameters:

  • self

Returns: Dict[str, Any]

Functions

create_optimized_comment_processor

def create_optimized_comment_processor(max_comments: int = 100) -> StreamlinedCommentProcessor

Factory function to create optimized comment processor.

Args: max_comments: Maximum number of comments to process

Returns: Configured StreamlinedCommentProcessor instance

Parameters:

  • max_comments (int) optional

Returns: StreamlinedCommentProcessor

_direct_text

def _direct_text(elem)

Get only the immediate text of an element, not from nested <p> children.

Parameters:

  • elem