Filter¶
Antibugs¶
- pytexmd.filter.antibugs.raw_remove_comments(input: str) str[source]¶
Removes comments (lines starting with %) from a raw string.
- Parameters:
input (str) – The input string to process.
- Returns:
The string with comments removed.
- Return type:
str
Example
>>> raw_remove_comments("Hello % comment\nWorld") 'Hello \nWorld'
- pytexmd.filter.antibugs.no_more_html_bugs(input: str) str[source]¶
Fixes HTML bugs by adding spaces around ‘<’ and ‘>’ characters.
- Parameters:
input (str) – The input string.
- Returns:
The processed string with spaces around ‘<’ and ‘>’.
- Return type:
str
Example
>>> no_more_html_bugs("<div>") ' < div > '
- pytexmd.filter.antibugs.no_more_dolar_bugs_begin(input: str) str[source]¶
Replaces escaped dollar signs ($) with a placeholder.
- Parameters:
input (str) – The input string.
- Returns:
The string with ‘$’ replaced by ‘BACKSLASHDOLLAR’.
- Return type:
str
Example
>>> no_more_dolar_bugs_begin("Price is \$5") 'Price is BACKSLASHDOLLAR5'
- pytexmd.filter.antibugs.no_more_dolar_bugs_end(input: str) str[source]¶
Restores dollar signs by replacing the placeholder with ‘$’.
- Parameters:
input (str) – The input string.
- Returns:
The string with ‘BACKSLASHDOLLAR’ replaced by ‘$’.
- Return type:
str
Example
>>> no_more_dolar_bugs_end("Price is BACKSLASHDOLLAR5") 'Price is $5'
- pytexmd.filter.antibugs.no_more_textup_bugs_begin(input: str) str[source]¶
Removes ‘textup’ from the input string.
- Parameters:
input (str) – The input string.
- Returns:
The string with ‘textup’ removed.
- Return type:
str
Example
>>> no_more_textup_bugs_begin("This is \textup{important}") 'This is {important}'
- pytexmd.filter.antibugs.remove_empty_at_begin(input: str) str[source]¶
Removes leading spaces and newlines from the input string.
- Parameters:
input (str) – The input string.
- Returns:
The string with leading spaces and newlines removed.
- Return type:
str
Example
>>> remove_empty_at_begin(" \nHello") 'Hello'
- pytexmd.filter.antibugs.only_two_breaks(input: str) str[source]¶
Ensures that there are at most two consecutive line breaks in the input string.
- Parameters:
input (str) – The input string.
- Returns:
The processed string with at most two consecutive line breaks.
- Return type:
str
Example
>>> only_two_breaks("a<br><br><br>b") 'a<br><br>b'
- pytexmd.filter.antibugs.no_more_bugs_begin(input: str) str[source]¶
Applies a series of bug fixes to the input string at the beginning of processing.
- Parameters:
input (str) – The input string.
- Returns:
The processed string after applying bug fixes.
- Return type:
str
Example
>>> no_more_bugs_begin("Some \$text <div> \textup{here}") 'Some BACKSLASHDOLLARtext < div > {here}'
- pytexmd.filter.antibugs.no_more_bugs_end(input: str) str[source]¶
Applies a series of bug fixes to the input string at the end of processing.
- Parameters:
input (str) – The input string.
- Returns:
The processed string after applying bug fixes.
- Return type:
str
Example
>>> no_more_bugs_end("Some BACKSLASHDOLLARtext") 'Some $text'
Core¶
Core filter classes and utilities for pytexmd.
This module provides the main classes and functions for parsing and processing LaTeX content, including tree elements, searchers, and helpers for Markdown/MyST conversion.
- class pytexmd.filter.core.Element(modifiable_content: str, parent: Element | None)[source]¶
Bases:
objectBase class for LaTeX tree elements.
- _modifiable_content¶
Content to be processed.
- Type:
str
Example
>>> elem = Element("some content", None) >>> elem._modifiable_content 'some content'
- hasattr(string: str) bool[source]¶
Check if the element has a given attribute.
- Parameters:
string (str) – Attribute name.
- Returns:
True if attribute exists, False otherwise.
- Return type:
bool
Example
>>> class Dummy(Element): pass >>> d = Dummy("x", None) >>> d.hasattr("_modifiable_content") True
- search_attribute_holder(string: str) Element | None[source]¶
Find the nearest ancestor with the given attribute.
- Parameters:
string (str) – Attribute name.
- Returns:
Element holding the attribute, or None.
- Return type:
Optional[Element]
Example
>>> class Dummy(Element): pass >>> d = Dummy("x", None) >>> d.search_attribute_holder("_modifiable_content") is d True
- all_childs() List[Element][source]¶
Recursively collect all child elements.
- Returns:
List of all child elements including self.
- Return type:
List[Element]
Example
>>> e = Element("abc", None) >>> e.all_childs()[0] is e True
- search_on_func(function: Callable[[Element], bool]) Element | None[source]¶
Search ancestors using a predicate function.
- Parameters:
function (Callable[[Element], bool]) – Predicate function.
- Returns:
First matching ancestor or None.
- Return type:
Optional[Element]
Example
>>> e = Element("abc", None) >>> e.search_on_func(lambda x: True) is e True
- search_class(searcher: type) Element | None[source]¶
Search ancestors for a specific class type.
- Parameters:
searcher (type) – Class type to search for.
- Returns:
First matching ancestor or None.
- Return type:
Optional[Element]
Example
>>> class Dummy(Element): pass >>> d = Dummy("x", None) >>> d.search_class(Dummy) is d True
- search_up_on_func(function: Callable[[Element], bool]) Element | None[source]¶
Search upwards in the tree for an element matching a predicate.
- Parameters:
function (Callable[[Element], bool]) – Predicate function.
- Returns:
First matching element or None.
- Return type:
Optional[Element]
Example
>>> class Dummy(Element): pass >>> d = Dummy("x", None) >>> d.search_up_on_func(lambda x: True) is d True
- class pytexmd.filter.core.Document(modifiable_content: str, parent: Element)[source]¶
Bases:
StructureMakerElement representing a LaTeX document.
- class pytexmd.filter.core.Undefined(modifiable_content: str, parent: Element)[source]¶
Bases:
StructureMakerElement for undefined LaTeX content.
- class pytexmd.filter.core.RawText(string: str, parent: Element)[source]¶
Bases:
ElementElement for raw text content.
- class pytexmd.filter.core.JunkSearcher(junk_name: str, save_split: bool = True)[source]¶
Bases:
SearcherSearcher for junk LaTeX commands.
- class pytexmd.filter.core.ReplaceSearcher(junk_name: str, replacement: str, save_split: bool = True)[source]¶
Bases:
SearcherSearcher for replacing LaTeX commands.
- class pytexmd.filter.core.GuardianSearcher(name: str, save_split: bool = True)[source]¶
Bases:
SearcherSearcher for guarding LaTeX commands.
- class pytexmd.filter.core.OneArgumentJunkSearcher(command_name: str, begin_brace: str = '{', end_brace: str = '}')[source]¶
Bases:
SearcherSearcher for junk commands with one argument.
- class pytexmd.filter.core.OneArgumentCommandSearcher(command_name: str, begin: str, end: str)[source]¶
Bases:
SearcherSearcher for commands with one argument.
- pytexmd.filter.core.find_nearest_classes(string: str, all_classes: List[Element]) List[Element][source]¶
Find nearest matching element classes in a string.
- Parameters:
string (str) – Input string.
all_classes (List[Element]) – List of element classes.
- Returns:
List of nearest matching classes.
- Return type:
List[Element]
Example
>>> class Dummy: ... @staticmethod ... def position(s): return s.find("x") >>> find_nearest_classes("abcxdef", [Dummy]) [Dummy]
- pytexmd.filter.core.has_value_equal(instance: Element, attribute_name: str, value) bool[source]¶
Check if an element’s attribute equals a value.
- Parameters:
instance (Element) – Element instance.
attribute_name (str) – Attribute name.
value – Value to compare.
- Returns:
True if attribute equals value, False otherwise.
- Return type:
bool
Example
>>> class Dummy(Element): pass >>> d = Dummy("x", None) >>> d.section_number = 5 >>> has_value_equal(d, "section_number", 5) True
- pytexmd.filter.core.get_number_within_equation(string: str) str[source]¶
Extract equation numbering context from LaTeX string.
- Parameters:
string (str) – LaTeX string.
- Returns:
Numbering context or “document”.
- Return type:
str
Example
>>> get_number_within_equation("abc\numberwithin{equation}{section}") 'section'
- class pytexmd.filter.core.Searcher[source]¶
Bases:
objectBase class for searchers to find LaTeX constructs.
Example
>>> class DummySearcher(Searcher): ... def position(self, s): return s.find("x") ... def split_and_create(self, s, p): return "", Element("x", p), "" >>> ds = DummySearcher() >>> ds.position("abcxdef") 3
- class pytexmd.filter.core.BeginEndSearcher(command_name: str, element_type: type, save_split: bool = True)[source]¶
Bases:
SearcherSearcher for LaTeX environments with egin and end.
- name¶
Environment name.
- Type:
str
- save_split¶
Whether to save the split command.
- Type:
bool
Example
>>> searcher = BeginEndSearcher("itemize") >>> searcher.name 'itemize'
- class pytexmd.filter.core.SectionLikeSearcher(command_name: str)[source]¶
Bases:
SearcherSearcher for LaTeX commands.
- name¶
Command name.
- Type:
str
- save_split¶
Whether to save the split command.
- Type:
bool
- class pytexmd.filter.core.SectionLike(modifiable_content: str, parent, command_name: str, name: str)[source]¶
Bases:
StructureMakerElement for section-like LaTeX commands.
- class pytexmd.filter.core.LabelType(*values)[source]¶
Bases:
Enum- REF = 'ref'¶
- NUMREF = 'numref'¶
- SECTION_LIKE = 'section_like'¶
- DOC = 'doc'¶
- EQ = 'eq'¶
- PRF_REF = 'prf:ref'¶
- ENUMERATION_ITEM = 'enumeration_item'¶
- class pytexmd.filter.core.BackMatter(save_split: bool = True)[source]¶
Bases:
SearcherSearcher for back matter LaTeX commands.
Enumitem¶
- class pytexmd.filter.enumitem.Itemize(modifiable_content: str, parent: Element)[source]¶
Bases:
ElementRepresents a LaTeX itemize environment.
Example
>>> itemize = Itemize("content", None) >>> isinstance(itemize.to_string(), str) True
- current_index = 0¶
- to_string() str[source]¶
Converts the itemize to a formatted string.
- Returns:
The formatted itemize string.
- Return type:
str
Example
>>> itemize = Itemize("abc", None) >>> isinstance(itemize.to_string(), str) True
- static position(string: str) int[source]¶
Finds the position of ‘begin{itemize}’ in the string.
- Parameters:
string (str) – The input string.
- Returns:
The position index.
- Return type:
int
Example
>>> Itemize.position("\begin{itemize}abc") 0
- static split_and_create(string: str, parent: Element) tuple[source]¶
Splits the string on itemize environment and creates an Itemize.
- Parameters:
string (str) – The input string.
parent (Element) – The parent element.
- Returns:
(pre, Itemize, post)
- Return type:
tuple
Example
>>> pre, itemize, post = Itemize.split_and_create("\begin{itemize}abc\end{itemize}", None) >>> isinstance(itemize, Itemize) True
- class pytexmd.filter.enumitem.ItemizeItem(modifiable_content: str, parent: Element, enum_item: str = '*')[source]¶
Bases:
ElementRepresents an item in a LaTeX itemize environment.
Example
>>> item = ItemizeItem("First item", None) >>> print(item.to_string()) • First item
- label_name() str[source]¶
Returns the label of the item.
- Returns:
The label.
- Return type:
str
Example
>>> item = ItemizeItem("abc", None) >>> item.label_name() '•'
- to_string() str[source]¶
Converts the item to a formatted string.
- Returns:
The formatted item string.
- Return type:
str
Example
>>> item = ItemizeItem("abc", None) >>> isinstance(item.to_string(), str) True
- static position(string: str) int[source]¶
Finds the position of ‘item’ in the string.
- Parameters:
string (str) – The input string.
- Returns:
The position index.
- Return type:
int
Example
>>> ItemizeItem.position("\item abc") 0
- static split_and_create(string: str, parent: Element) tuple[source]¶
Splits the string on ‘item’ and creates an ItemizeItem.
- Parameters:
string (str) – The input string.
parent (Element) – The parent element.
- Returns:
(pre, ItemizeItem, post)
- Return type:
tuple
Example
>>> pre, item, post = ItemizeItem.split_and_create("\item abc", None) >>> isinstance(item, ItemizeItem) True
- class pytexmd.filter.enumitem.Enumeration(modifiable_content: str, parent: Element, start, label_part)[source]¶
Bases:
ElementRepresents a LaTeX enumerate environment.
Example
>>> enum = Enumeration("content", None, enum_style_arabic, "(", ")") >>> isinstance(enum.to_string(), str) True
- to_string() str[source]¶
Converts the enumerate to a formatted string.
- Returns:
The formatted enumerate string.
- Return type:
str
Example
>>> enum = Enumeration("abc", None, enum_style_arabic, "(", ")") >>> isinstance(enum.to_string(), str) True
- static position(string: str) int[source]¶
Finds the position of ‘begin{enumerate}’ in the string.
- Parameters:
string (str) – The input string.
- Returns:
The position index.
- Return type:
int
Example
>>> Enumeration.position("\begin{enumerate}abc") 0
- static split_and_create(string: str, parent: Element) tuple[source]¶
Splits the string on enumerate environment and creates an Enumeration.
- Parameters:
string (str) – The input string.
parent (Element) – The parent element.
- Returns:
(pre, Enumeration, post)
- Return type:
tuple
Example
>>> pre, enum, post = Enumeration.split_and_create("\begin{enumerate}abc\end{enumerate}", None) >>> isinstance(enum, Enumeration) True
- class pytexmd.filter.enumitem.EnumerationItem(modifiable_content: str, parent: Element, enum_item: str = None)[source]¶
Bases:
ElementRepresents an item in a LaTeX enumerate environment.
Example
>>> enum_item = EnumerationItem("First", None) >>> isinstance(enum_item.to_string(), str) True
- to_string() str[source]¶
Output Markdown/MyST string.
- Returns:
Markdown/MyST representation.
- Return type:
str
Example
>>> class Dummy(Element): ... def to_string(self): return "dummy" >>> Dummy("abc", None).to_string() 'dummy'
- static position(string: str) int[source]¶
Finds the position of ‘item’ in the string.
- Parameters:
string (str) – The input string.
- Returns:
The position index.
- Return type:
int
Example
>>> EnumerationItem.position("\item abc") 0
- static split_and_create(string: str, parent: Element) tuple[source]¶
Splits the string on ‘item’ and creates an EnumerationItem.
- Parameters:
string (str) – The input string.
parent (Element) – The parent element.
- Returns:
(pre, EnumerationItem, post)
- Return type:
tuple
Example
>>> pre, enum_item, post = EnumerationItem.split_and_create("\item abc", None) >>> isinstance(enum_item, EnumerationItem) True
Equations¶
Equation filter classes and utilities for pytexmd.
This module provides classes and functions for parsing and processing LaTeX equations, environments, and math for Markdown/MyST conversion.
- pytexmd.filter.equations.apply_latex_protection(string: Element) Element[source]¶
Expands and protects LaTeX environments and commands in the given element.
- class pytexmd.filter.equations.InlineLatex(modifiable_content: str, parent: Element)[source]¶
Bases:
ElementRepresents inline LaTeX math ($…$).
Example
>>> inline = InlineLatex("x^2", None) >>> isinstance(inline.to_string(), str) True
- static split_and_create(string: str, parent: Element) Tuple[str, InlineLatex, str][source]¶
Split string and create InlineLatex element.
- Parameters:
string (str) – Input string.
parent (Element) – Parent element.
- Returns:
Pre-content, InlineLatex, post-content.
- Return type:
Tuple[str, InlineLatex, str]
- class pytexmd.filter.equations.LatexText(modifiable_content: str, parent: Element)[source]¶
Bases:
ElementRepresents LaTeX text command.
Example
>>> text = LatexText("hello", None) >>> isinstance(text.to_string(), str) True
- class pytexmd.filter.equations.Cases(modifiable_content: str, parent: Element)[source]¶
Bases:
ElementRepresents LaTeX cases environment.
Example
>>> cases = Cases("x & y \\ z & w", None) >>> isinstance(cases.to_string(), str) True
- class pytexmd.filter.equations.DoubleDolarLatex(modifiable_content: str, parent: Element)[source]¶
Bases:
ElementRepresents display math ($$…$$).
Example
>>> dbl = DoubleDolarLatex("x^2", None) >>> isinstance(dbl, DoubleDolarLatex) True
- prio_elem = True¶
File Maker¶
- pytexmd.filter.file_maker.string_to_tree(string: str) Document[source]¶
Converts a string to a document tree structure.
- Parameters:
string (str) – The input string to process.
- Returns:
The processed document tree.
- Return type:
Example
`python latex = r"""\section{Intro}\begin{equation}E=mc^2\end{equation}""" doc = string_to_tree(latex) print(doc.to_string()) `
- pytexmd.filter.file_maker.process_string(output_folder: str, string: str, depth=2, output_suffix: str = '.md', verify=True)[source]¶
Processes a LaTeX string and writes the document to hierarchical MyST files.
This function converts LaTeX to a document tree, then splits it into multiple files based on section hierarchy with automatic content verification.
- Parameters:
output_folder (str) – The output folder path.
string (str) – The input LaTeX string.
depth (int, optional) – Splitting depth (0=no split, 1=chapter, 2=section, etc.). Defaults to 2.
output_suffix (str, optional) – The file suffix. Defaults to “.md”.
verify (bool, optional) – Verify content integrity after parsing. Defaults to True.
- Returns:
Root structure with child_files tracking for all sections
- Return type:
dict
Example
`python # Process a LaTeX string and split into hierarchical files latex = r"""\chapter{Intro}\section{Background}\subsection{Details}""" structure = process_string("output", latex, depth=2) # Creates: output/intro.md with toctree to output/background.md `
- pytexmd.filter.file_maker.element_to_file_whole(element: SectionLike, output_folder: str, file_name: str, output_suffix: str = '.md')[source]¶
Writes the whole element to a file.
- Parameters:
element (SectionEnumerate) – The element to write.
output_folder (str) – The output folder path.
file_name (str) – The file name.
output_suffix (str, optional) – The file suffix. Defaults to “.md”.
- Returns:
None
Example
`python # Save the entire document as 'output/index.md' doc = string_to_tree(r"\section{Intro}") element_to_file_whole(doc, "output", "index") `
- pytexmd.filter.file_maker.element_to_file_only_begin(element: SectionLike, output_folder: str, file_name: str, file_names: List[str], output_suffix: str = '.md')[source]¶
Writes only the beginning part of the element to a file, with a toctree.
- Parameters:
element (SectionEnumerate) – The element to write.
output_folder (str) – The output folder path.
file_name (str) – The file name.
output_suffix (str, optional) – The file suffix. Defaults to “.md”.
- Returns:
None
Example
`python # Save only the introduction and generate a toctree for subsections doc = string_to_tree(r"\section{Intro}\section{Background}") element_to_file_only_begin(doc, "output", "index") `
- pytexmd.filter.file_maker.split_document_to_files(document_md, output_folder, depth=2, output_suffix='.md', verify=True)[source]¶
Main function to split document tree into hierarchical MyST files.
Each section file will know its child files through the structure.
- Parameters:
document_md – Document tree object (from string_to_tree)
output_folder (str) – Output directory path
depth (int) – Splitting depth (0=no split, 1=chapter, 2=section, etc.)
output_suffix (str) – File extension
verify (bool) – Verify content integrity after parsing
- Returns:
Root structure with child_files tracking for all sections
- Return type:
dict
Example
`python # Convert and split a document doc = string_to_tree(latex_string) structure = split_document_to_files(doc, "./output", depth=2, verify=True) # Each section in structure has 'child_files' list `
- pytexmd.filter.file_maker.split_by_sections(content_string, max_depth=2)[source]¶
Split document string into hierarchical sections based on MyST comment markers.
- Parameters:
content_string (str) – The full document string with MyST markers
max_depth (int) – Maximum depth for splitting (0=part, 1=chapter, 2=section, etc.)
- Returns:
Hierarchical structure of sections with content and children tracking
- Return type:
dict
- pytexmd.filter.file_maker.verify_content_integrity(original_content, structure)[source]¶
Verify that the split structure contains all original content.
- Parameters:
original_content (str) – Original document string
structure (dict) – Parsed section structure
- Returns:
(is_valid, message, stats)
- Return type:
tuple
Notworking Preprocessor¶
- pytexmd.filter.notworking_preprocessor.do_newenvironment(string: str) str[source]¶
Processes all LaTeX newenvironment definitions and applies them.
- Parameters:
string (str) – The input string.
- Returns:
The string with environments expanded.
- Return type:
str
Example
>>> s = r"\newenvironment{foo}[2]{<b>#1 #2>}{</b>} \begin{foo}{a}{b}content\end{foo}" >>> do_newenvironment(s) '<b>a b>content</b>'
Preprocessor¶
- pytexmd.filter.preprocessor.do_commands(string: str) str[source]¶
Processes all LaTeX newcommand definitions and applies them.
- Parameters:
string (str) – The input string.
- Returns:
The string with commands expanded.
- Return type:
str
Example
>>> s = r"\newcommand{\foo}[2]{#1+#2} \foo{a}{b}" >>> do_commands(s) 'a+b'
- pytexmd.filter.preprocessor.do_newenvironment(string: str) str[source]¶
Processes all LaTeX newenvironment definitions and applies them.
- Parameters:
string (str) – The input string.
- Returns:
The string with environments expanded.
- Return type:
str
Example
>>> s = r"\newenvironment{foo}[2]{<b>#1 #2>}{</b>} \begin{foo}{a}{b}content\end{foo}" >>> do_newenvironment(s) '<b>a b>content</b>'
Splitting¶
- pytexmd.filter.splitting.get_all_allchars_no_abc() str[source]¶
Returns a string of non-alphabetic ASCII characters.
- Returns:
String containing non-alphabetic ASCII characters.
- Return type:
str
Example
>>> chars = get_all_allchars_no_abc() >>> isinstance(chars, str) True
- pytexmd.filter.splitting.save_command_split(string: str, split_on: str) List[str][source]¶
Splits a string on a given substring, preserving certain patterns.
- Parameters:
string (str) – The input string to split.
split_on (str) – The substring to split on.
- Returns:
List of split string segments.
- Return type:
List[str]
- Raises:
ValueError – If input types are incorrect.
Example
>>> parts = save_command_split("foo$bar$baz", "$") >>> parts ['foo', 'bar', 'baz']
- pytexmd.filter.splitting.first_char_brace(string: str, begin_brace: str = '{') bool[source]¶
Checks if the first non-whitespace character of a string is a given brace.
- Parameters:
string (str) – The input string.
begin_brace (str, optional) – The brace character to check. Defaults to “{“.
- Returns:
True if first character is the brace, False otherwise.
- Return type:
bool
- Raises:
ValueError – If input types are incorrect.
Example
>>> is_brace = first_char_brace(" {foo}") >>> is_brace True
- pytexmd.filter.splitting.split_on_first_brace(string: str, begin_brace='{', end_brace='}', error_replacement='brace_error') Tuple[str, str][source]¶
Splits a string on the first matching pair of braces.
- Parameters:
string (str) – The input string.
begin_brace (str, optional) – The opening brace. Defaults to “{“.
end_brace (str, optional) – The closing brace. Defaults to “}”.
error_replacement (str, optional) – Replacement string if brace not found. Defaults to “brace_error”.
- Returns:
Content inside braces, and the remaining string.
- Return type:
Tuple[str, str]
- Raises:
ValueError – If input types are incorrect.
Example
>>> inside, rest = split_on_first_brace("{foo}bar") >>> inside 'foo' >>> rest 'bar'
- pytexmd.filter.splitting.split_rename(string: str) Tuple[str, str] | None[source]¶
Splits the input string into a name and the remaining string if the first character is a ‘[‘.
- Parameters:
string (str) – The input string.
- Returns:
A tuple containing the name and the remaining string, or None if the first character is not ‘[‘.
- Return type:
Optional[Tuple[str, str]]
- Raises:
ValueError – If input is not a string.
Example
>>> name, rest = split_rename("[foo]bar") >>> name 'foo' >>> rest 'bar'
- pytexmd.filter.splitting.split_on_next(string: str, split_on: str, save_split: bool = True) Tuple[str, str][source]¶
Splits a string on the next occurrence of a substring.
- Parameters:
string (str) – The input string.
split_on (str) – The substring to split on.
save_split (bool, optional) – Whether to use save_command_split. Defaults to True.
- Returns:
The part before and after the split.
- Return type:
Tuple[str, str]
- Raises:
ValueError – If input types are incorrect.
Example
>>> before, after = split_on_next("foo$bar$baz", "$") >>> before 'foo' >>> after 'bar$baz'
- pytexmd.filter.splitting.begin_end_split(string: str, begin_name: str, end_name: str, save_split: bool = False) Tuple[str, str, str][source]¶
Splits a string into three parts: before, between, and after given begin and end substrings.
- Parameters:
string (str) – The input string.
begin_name (str) – The substring marking the beginning.
end_name (str) – The substring marking the end.
save_split (bool, optional) – Whether to use save_command_split. Defaults to False.
- Returns:
The parts before, between, and after the delimiters.
- Return type:
Tuple[str, str, str]
- Raises:
ValueError – If input types are incorrect.
Example
>>> pre, mid, post = begin_end_split("a\begin{env}b\end{env}c", "\begin{env}", "\end{env}") >>> pre 'a' >>> mid 'b' >>> post 'c'
- pytexmd.filter.splitting.position_of(string: str, begin_name: str, save_split: bool = True) int[source]¶
Finds the position of a substring in a string.
- Parameters:
string (str) – The input string.
begin_name (str) – The substring to find.
save_split (bool, optional) – Whether to use save_command_split. Defaults to True.
- Returns:
The position index, or -1 if not found.
- Return type:
int
- Raises:
ValueError – If input types are incorrect.
Example
>>> pos = position_of("foo$bar", "$") >>> pos 3
Text¶
Section and theorem filter classes and utilities for pytexmd.
This module provides classes and functions for parsing and processing LaTeX sections, theorems, references, and formatting for Markdown/MyST conversion.
- class pytexmd.filter.text.Ref(modifiable_content: str, parent: Element, label_ref: str)[source]¶
Bases:
ElementElement for LaTeX ref reference.
Example
>>> ref = Ref("content", None, "mylabel") >>> isinstance(ref, Ref) True
- class pytexmd.filter.text.EqRef(modifiable_content: str, parent: Element, label_ref: str)[source]¶
Bases:
ElementElement for LaTeX eqref reference.
Example
>>> eqref = EqRef("content", None, "eq1") >>> isinstance(eqref, EqRef) True
- class pytexmd.filter.text.Proof(modifiable_content: str, parent: Element)[source]¶
Bases:
ElementElement for LaTeX proof environment.
Example
>>> proof = Proof("content", None) >>> isinstance(proof, Proof) True
- class pytexmd.filter.text.Textbf(modifiable_content: str, parent: Element)[source]¶
Bases:
ElementElement for LaTeX textbf command.
Example
>>> bold = Textbf("content", None) >>> isinstance(bold, Textbf) True
- class pytexmd.filter.text.Emph(modifiable_content: str, parent: Element)[source]¶
Bases:
ElementElement for LaTeX emph command.
Example
>>> emph = Emph("content", None) >>> isinstance(emph, Emph) True
- class pytexmd.filter.text.Cite(modifiable_content: str, parent: Element, citations: list[str], rename: str)[source]¶
Bases:
ElementElement for LaTeX cite command.
Example
>>> cite = Cite("content", None, ["ref1", "ref2"]) >>> isinstance(cite, Cite) True
- pytexmd.filter.text.get_all_filters() list[source]¶
Returns all section-related filter classes/searchers.
- Returns:
List of filter classes/searchers.
- Return type:
list
Example
>>> filters = get_all_filters() >>> isinstance(filters, list) True
- pytexmd.filter.text.get_number_within_equation(input: str) str[source]¶
Extract equation numbering context from LaTeX string.
- Parameters:
input (str) – LaTeX string.
- Returns:
Numbering context or “document”.
- Return type:
str
Example
>>> get_number_within_equation("abc\numberwithin{equation}{section}") 'section'
- pytexmd.filter.text.get_theoremSearchers(input: str) list[source]¶
Extract theorem searchers from LaTeX preamble.
- Parameters:
input (str) – LaTeX preamble string.
- Returns:
List of TheoremSearcher instances.
- Return type:
list
Example
>>> result = get_theoremSearchers(r"\newtheorem{theorem}{Theorem}") >>> isinstance(result, list) True
- class pytexmd.filter.text.Textit(modifiable_content: str, parent: Element)[source]¶
Bases:
ElementElement for LaTeX textit command.
Example
>>> textit = Textit("content", None) >>> isinstance(textit, Textit) True
- class pytexmd.filter.text.ProofLabel(modifiable_content: str, parent: Element, label_ref: str)[source]¶
Bases:
ElementElement for MyST label.
- Parameters:
modifiable_content (str) – Content to process.
parent (Element) – Parent element.
label_ref (str) – Label reference.
Example
>>> label = ProofLabel("content", None, "mylabel") >>> isinstance(label, ProofLabel) True
- static split_and_create(string: str, parent: Element) Tuple[str, ProofLabel, str][source]¶
Module Contents¶
- pytexmd.filter.string_to_tree(string: str) Document[source]¶
Converts a string to a document tree structure.
- Parameters:
string (str) – The input string to process.
- Returns:
The processed document tree.
- Return type:
Example
`python latex = r"""\section{Intro}\begin{equation}E=mc^2\end{equation}""" doc = string_to_tree(latex) print(doc.to_string()) `
- pytexmd.filter.process_string(output_folder: str, string: str, depth=2, output_suffix: str = '.md', verify=True)[source]¶
Processes a LaTeX string and writes the document to hierarchical MyST files.
This function converts LaTeX to a document tree, then splits it into multiple files based on section hierarchy with automatic content verification.
- Parameters:
output_folder (str) – The output folder path.
string (str) – The input LaTeX string.
depth (int, optional) – Splitting depth (0=no split, 1=chapter, 2=section, etc.). Defaults to 2.
output_suffix (str, optional) – The file suffix. Defaults to “.md”.
verify (bool, optional) – Verify content integrity after parsing. Defaults to True.
- Returns:
Root structure with child_files tracking for all sections
- Return type:
dict
Example
`python # Process a LaTeX string and split into hierarchical files latex = r"""\chapter{Intro}\section{Background}\subsection{Details}""" structure = process_string("output", latex, depth=2) # Creates: output/intro.md with toctree to output/background.md `
- pytexmd.filter.element_to_file_whole(element: SectionLike, output_folder: str, file_name: str, output_suffix: str = '.md')[source]¶
Writes the whole element to a file.
- Parameters:
element (SectionEnumerate) – The element to write.
output_folder (str) – The output folder path.
file_name (str) – The file name.
output_suffix (str, optional) – The file suffix. Defaults to “.md”.
- Returns:
None
Example
`python # Save the entire document as 'output/index.md' doc = string_to_tree(r"\section{Intro}") element_to_file_whole(doc, "output", "index") `
- pytexmd.filter.element_to_file_only_begin(element: SectionLike, output_folder: str, file_name: str, file_names: List[str], output_suffix: str = '.md')[source]¶
Writes only the beginning part of the element to a file, with a toctree.
- Parameters:
element (SectionEnumerate) – The element to write.
output_folder (str) – The output folder path.
file_name (str) – The file name.
output_suffix (str, optional) – The file suffix. Defaults to “.md”.
- Returns:
None
Example
`python # Save only the introduction and generate a toctree for subsections doc = string_to_tree(r"\section{Intro}\section{Background}") element_to_file_only_begin(doc, "output", "index") `
- pytexmd.filter.split_document_to_files(document_md, output_folder, depth=2, output_suffix='.md', verify=True)[source]¶
Main function to split document tree into hierarchical MyST files.
Each section file will know its child files through the structure.
- Parameters:
document_md – Document tree object (from string_to_tree)
output_folder (str) – Output directory path
depth (int) – Splitting depth (0=no split, 1=chapter, 2=section, etc.)
output_suffix (str) – File extension
verify (bool) – Verify content integrity after parsing
- Returns:
Root structure with child_files tracking for all sections
- Return type:
dict
Example
`python # Convert and split a document doc = string_to_tree(latex_string) structure = split_document_to_files(doc, "./output", depth=2, verify=True) # Each section in structure has 'child_files' list `
- pytexmd.filter.split_by_sections(content_string, max_depth=2)[source]¶
Split document string into hierarchical sections based on MyST comment markers.
- Parameters:
content_string (str) – The full document string with MyST markers
max_depth (int) – Maximum depth for splitting (0=part, 1=chapter, 2=section, etc.)
- Returns:
Hierarchical structure of sections with content and children tracking
- Return type:
dict
- pytexmd.filter.verify_content_integrity(original_content, structure)[source]¶
Verify that the split structure contains all original content.
- Parameters:
original_content (str) – Original document string
structure (dict) – Parsed section structure
- Returns:
(is_valid, message, stats)
- Return type:
tuple