Filter

Antibugs

pytexmd.filter.antibugs.raw_remove_comments(input: str) str[source]

Removes comments (lines starting with %) from a raw string.

Parameters:

input (str) – The input string to process.

Returns:

The string with comments removed.

Return type:

str

Example

>>> raw_remove_comments("Hello % comment\nWorld")
'Hello \nWorld'
pytexmd.filter.antibugs.no_more_html_bugs(input: str) str[source]

Fixes HTML bugs by adding spaces around ‘<’ and ‘>’ characters.

Parameters:

input (str) – The input string.

Returns:

The processed string with spaces around ‘<’ and ‘>’.

Return type:

str

Example

>>> no_more_html_bugs("<div>")
' < div > '
pytexmd.filter.antibugs.no_more_dolar_bugs_begin(input: str) str[source]

Replaces escaped dollar signs ($) with a placeholder.

Parameters:

input (str) – The input string.

Returns:

The string with ‘$’ replaced by ‘BACKSLASHDOLLAR’.

Return type:

str

Example

>>> no_more_dolar_bugs_begin("Price is \$5")
'Price is BACKSLASHDOLLAR5'
pytexmd.filter.antibugs.no_more_dolar_bugs_end(input: str) str[source]

Restores dollar signs by replacing the placeholder with ‘$’.

Parameters:

input (str) – The input string.

Returns:

The string with ‘BACKSLASHDOLLAR’ replaced by ‘$’.

Return type:

str

Example

>>> no_more_dolar_bugs_end("Price is BACKSLASHDOLLAR5")
'Price is $5'
pytexmd.filter.antibugs.no_more_textup_bugs_begin(input: str) str[source]

Removes ‘textup’ from the input string.

Parameters:

input (str) – The input string.

Returns:

The string with ‘textup’ removed.

Return type:

str

Example

>>> no_more_textup_bugs_begin("This is \textup{important}")
'This is {important}'
pytexmd.filter.antibugs.remove_empty_at_begin(input: str) str[source]

Removes leading spaces and newlines from the input string.

Parameters:

input (str) – The input string.

Returns:

The string with leading spaces and newlines removed.

Return type:

str

Example

>>> remove_empty_at_begin("   \nHello")
'Hello'
pytexmd.filter.antibugs.only_two_breaks(input: str) str[source]

Ensures that there are at most two consecutive line breaks in the input string.

Parameters:

input (str) – The input string.

Returns:

The processed string with at most two consecutive line breaks.

Return type:

str

Example

>>> only_two_breaks("a<br><br><br>b")
'a<br><br>b'
pytexmd.filter.antibugs.no_more_bugs_begin(input: str) str[source]

Applies a series of bug fixes to the input string at the beginning of processing.

Parameters:

input (str) – The input string.

Returns:

The processed string after applying bug fixes.

Return type:

str

Example

>>> no_more_bugs_begin("Some \$text <div> \textup{here}")
'Some BACKSLASHDOLLARtext  < div >  {here}'
pytexmd.filter.antibugs.no_more_bugs_end(input: str) str[source]

Applies a series of bug fixes to the input string at the end of processing.

Parameters:

input (str) – The input string.

Returns:

The processed string after applying bug fixes.

Return type:

str

Example

>>> no_more_bugs_end("Some BACKSLASHDOLLARtext")
'Some $text'

Core

Core filter classes and utilities for pytexmd.

This module provides the main classes and functions for parsing and processing LaTeX content, including tree elements, searchers, and helpers for Markdown/MyST conversion.

class pytexmd.filter.core.Element(modifiable_content: str, parent: Element | None)[source]

Bases: object

Base class for LaTeX tree elements.

children

Child elements.

Type:

List[Element] | None

_modifiable_content

Content to be processed.

Type:

str

parent

Parent element.

Type:

Element | None

Example

>>> elem = Element("some content", None)
>>> elem._modifiable_content
'some content'
hasattr(string: str) bool[source]

Check if the element has a given attribute.

Parameters:

string (str) – Attribute name.

Returns:

True if attribute exists, False otherwise.

Return type:

bool

Example

>>> class Dummy(Element): pass
>>> d = Dummy("x", None)
>>> d.hasattr("_modifiable_content")
True
search_attribute_holder(string: str) Element | None[source]

Find the nearest ancestor with the given attribute.

Parameters:

string (str) – Attribute name.

Returns:

Element holding the attribute, or None.

Return type:

Optional[Element]

Example

>>> class Dummy(Element): pass
>>> d = Dummy("x", None)
>>> d.search_attribute_holder("_modifiable_content") is d
True
all_childs() List[Element][source]

Recursively collect all child elements.

Returns:

List of all child elements including self.

Return type:

List[Element]

Example

>>> e = Element("abc", None)
>>> e.all_childs()[0] is e
True
search_on_func(function: Callable[[Element], bool]) Element | None[source]

Search ancestors using a predicate function.

Parameters:

function (Callable[[Element], bool]) – Predicate function.

Returns:

First matching ancestor or None.

Return type:

Optional[Element]

Example

>>> e = Element("abc", None)
>>> e.search_on_func(lambda x: True) is e
True
search_class(searcher: type) Element | None[source]

Search ancestors for a specific class type.

Parameters:

searcher (type) – Class type to search for.

Returns:

First matching ancestor or None.

Return type:

Optional[Element]

Example

>>> class Dummy(Element): pass
>>> d = Dummy("x", None)
>>> d.search_class(Dummy) is d
True
search_up_on_func(function: Callable[[Element], bool]) Element | None[source]

Search upwards in the tree for an element matching a predicate.

Parameters:

function (Callable[[Element], bool]) – Predicate function.

Returns:

First matching element or None.

Return type:

Optional[Element]

Example

>>> class Dummy(Element): pass
>>> d = Dummy("x", None)
>>> d.search_up_on_func(lambda x: True) is d
True
expand(all_classes: List[Element]) None[source]

Expand the element tree by processing children.

Parameters:

all_classes (List[Element]) – List of element classes.

Example

>>> class Dummy(Element): pass
>>> e = Element("abc", None)
>>> e.expand([Dummy])
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.core.Document(modifiable_content: str, parent: Element)[source]

Bases: StructureMaker

Element representing a LaTeX document.

static position(string: str) int[source]
static split_and_create(string: str, parent: Element) Tuple[str, Document, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
get_structures() List[SectionStructure][source]
class pytexmd.filter.core.Undefined(modifiable_content: str, parent: Element)[source]

Bases: StructureMaker

Element for undefined LaTeX content.

to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
get_structures() List[SectionStructure][source]
class pytexmd.filter.core.RawText(string: str, parent: Element)[source]

Bases: Element

Element for raw text content.

to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.core.JunkSearcher(junk_name: str, save_split: bool = True)[source]

Bases: Searcher

Searcher for junk LaTeX commands.

position(string: str) int[source]

Find position of construct in string.

Parameters:

string (str) – Input string.

Returns:

Position index, or -1 if not found.

Return type:

int

split_and_create(string: str, parent: Element) Tuple[str, Undefined, str][source]

Split string and create element for construct.

Parameters:
  • string (str) – Input string.

  • parent (Element) – Parent element.

Returns:

Pre-content, created element, post-content.

Return type:

Tuple[str, Element, str]

class pytexmd.filter.core.ReplaceSearcher(junk_name: str, replacement: str, save_split: bool = True)[source]

Bases: Searcher

Searcher for replacing LaTeX commands.

position(string: str) int[source]

Find position of construct in string.

Parameters:

string (str) – Input string.

Returns:

Position index, or -1 if not found.

Return type:

int

split_and_create(string: str, parent: Element) Tuple[str, Undefined, str][source]

Split string and create element for construct.

Parameters:
  • string (str) – Input string.

  • parent (Element) – Parent element.

Returns:

Pre-content, created element, post-content.

Return type:

Tuple[str, Element, str]

class pytexmd.filter.core.GuardianSearcher(name: str, save_split: bool = True)[source]

Bases: Searcher

Searcher for guarding LaTeX commands.

position(string: str) int[source]

Find position of construct in string.

Parameters:

string (str) – Input string.

Returns:

Position index, or -1 if not found.

Return type:

int

split_and_create(string: str, parent: Element) Tuple[str, RawText, str][source]

Split string and create element for construct.

Parameters:
  • string (str) – Input string.

  • parent (Element) – Parent element.

Returns:

Pre-content, created element, post-content.

Return type:

Tuple[str, Element, str]

class pytexmd.filter.core.OneArgumentJunkSearcher(command_name: str, begin_brace: str = '{', end_brace: str = '}')[source]

Bases: Searcher

Searcher for junk commands with one argument.

position(string: str) int[source]

Find position of construct in string.

Parameters:

string (str) – Input string.

Returns:

Position index, or -1 if not found.

Return type:

int

split_and_create(string: str, parent: Element) Tuple[str, Undefined, str][source]

Split string and create element for construct.

Parameters:
  • string (str) – Input string.

  • parent (Element) – Parent element.

Returns:

Pre-content, created element, post-content.

Return type:

Tuple[str, Element, str]

class pytexmd.filter.core.OneArgumentCommandSearcher(command_name: str, begin: str, end: str)[source]

Bases: Searcher

Searcher for commands with one argument.

position(string: str) int[source]

Find position of construct in string.

Parameters:

string (str) – Input string.

Returns:

Position index, or -1 if not found.

Return type:

int

split_and_create(string: str, parent: Element) Tuple[str, Undefined, str][source]

Split string and create element for construct.

Parameters:
  • string (str) – Input string.

  • parent (Element) – Parent element.

Returns:

Pre-content, created element, post-content.

Return type:

Tuple[str, Element, str]

pytexmd.filter.core.find_nearest_classes(string: str, all_classes: List[Element]) List[Element][source]

Find nearest matching element classes in a string.

Parameters:
  • string (str) – Input string.

  • all_classes (List[Element]) – List of element classes.

Returns:

List of nearest matching classes.

Return type:

List[Element]

Example

>>> class Dummy:
...     @staticmethod
...     def position(s): return s.find("x")
>>> find_nearest_classes("abcxdef", [Dummy])
[Dummy]
pytexmd.filter.core.has_value_equal(instance: Element, attribute_name: str, value) bool[source]

Check if an element’s attribute equals a value.

Parameters:
  • instance (Element) – Element instance.

  • attribute_name (str) – Attribute name.

  • value – Value to compare.

Returns:

True if attribute equals value, False otherwise.

Return type:

bool

Example

>>> class Dummy(Element): pass
>>> d = Dummy("x", None)
>>> d.section_number = 5
>>> has_value_equal(d, "section_number", 5)
True
pytexmd.filter.core.get_number_within_equation(string: str) str[source]

Extract equation numbering context from LaTeX string.

Parameters:

string (str) – LaTeX string.

Returns:

Numbering context or “document”.

Return type:

str

Example

>>> get_number_within_equation("abc\numberwithin{equation}{section}")
'section'
class pytexmd.filter.core.Searcher[source]

Bases: object

Base class for searchers to find LaTeX constructs.

Example

>>> class DummySearcher(Searcher):
...     def position(self, s): return s.find("x")
...     def split_and_create(self, s, p): return "", Element("x", p), ""
>>> ds = DummySearcher()
>>> ds.position("abcxdef")
3
position(string: str) int[source]

Find position of construct in string.

Parameters:

string (str) – Input string.

Returns:

Position index, or -1 if not found.

Return type:

int

split_and_create(string: str, parent: Element) Tuple[str, Element, str][source]

Split string and create element for construct.

Parameters:
  • string (str) – Input string.

  • parent (Element) – Parent element.

Returns:

Pre-content, created element, post-content.

Return type:

Tuple[str, Element, str]

class pytexmd.filter.core.BeginEndSearcher(command_name: str, element_type: type, save_split: bool = True)[source]

Bases: Searcher

Searcher for LaTeX environments with egin and end.

name

Environment name.

Type:

str

save_split

Whether to save the split command.

Type:

bool

Example

>>> searcher = BeginEndSearcher("itemize")
>>> searcher.name
'itemize'
position(string: str) int[source]

Find position of construct in string.

Parameters:

string (str) – Input string.

Returns:

Position index, or -1 if not found.

Return type:

int

split_and_create(string: str, parent: Element) Tuple[str, Element, str][source]

Split string and create element for construct.

Parameters:
  • string (str) – Input string.

  • parent (Element) – Parent element.

Returns:

Pre-content, created element, post-content.

Return type:

Tuple[str, Element, str]

class pytexmd.filter.core.SectionLikeSearcher(command_name: str)[source]

Bases: Searcher

Searcher for LaTeX commands.

name

Command name.

Type:

str

save_split

Whether to save the split command.

Type:

bool

position(string: str) int[source]

Find position of construct in string.

Parameters:

string (str) – Input string.

Returns:

Position index, or -1 if not found.

Return type:

int

split_and_create(input: str, parent: Element) Tuple[str, Element, str][source]

Split string and create element for construct.

Parameters:
  • string (str) – Input string.

  • parent (Element) – Parent element.

Returns:

Pre-content, created element, post-content.

Return type:

Tuple[str, Element, str]

class pytexmd.filter.core.SectionLike(modifiable_content: str, parent, command_name: str, name: str)[source]

Bases: StructureMaker

Element for section-like LaTeX commands.

to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
get_content() str[source]
get_structures() List[SectionStructure][source]
pytexmd.filter.core.label_call(org: str, label_type: LabelType, rename: str = '') str[source]
pytexmd.filter.core.ref_call(org: str) str[source]
class pytexmd.filter.core.LabelType(*values)[source]

Bases: Enum

REF = 'ref'
NUMREF = 'numref'
SECTION_LIKE = 'section_like'
DOC = 'doc'
EQ = 'eq'
PRF_REF = 'prf:ref'
ENUMERATION_ITEM = 'enumeration_item'
class pytexmd.filter.core.BackMatter(save_split: bool = True)[source]

Bases: Searcher

Searcher for back matter LaTeX commands.

position(string: str) int[source]

Find position of construct in string.

Parameters:

string (str) – Input string.

Returns:

Position index, or -1 if not found.

Return type:

int

split_and_create(string: str, parent: Element) Tuple[str, Undefined, str][source]

Split string and create element for construct.

Parameters:
  • string (str) – Input string.

  • parent (Element) – Parent element.

Returns:

Pre-content, created element, post-content.

Return type:

Tuple[str, Element, str]

Enumitem

class pytexmd.filter.enumitem.Itemize(modifiable_content: str, parent: Element)[source]

Bases: Element

Represents a LaTeX itemize environment.

Example

>>> itemize = Itemize("content", None)
>>> isinstance(itemize.to_string(), str)
True
current_index = 0
to_string() str[source]

Converts the itemize to a formatted string.

Returns:

The formatted itemize string.

Return type:

str

Example

>>> itemize = Itemize("abc", None)
>>> isinstance(itemize.to_string(), str)
True
static position(string: str) int[source]

Finds the position of ‘begin{itemize}’ in the string.

Parameters:

string (str) – The input string.

Returns:

The position index.

Return type:

int

Example

>>> Itemize.position("\begin{itemize}abc")
0
static split_and_create(string: str, parent: Element) tuple[source]

Splits the string on itemize environment and creates an Itemize.

Parameters:
  • string (str) – The input string.

  • parent (Element) – The parent element.

Returns:

(pre, Itemize, post)

Return type:

tuple

Example

>>> pre, itemize, post = Itemize.split_and_create("\begin{itemize}abc\end{itemize}", None)
>>> isinstance(itemize, Itemize)
True
class pytexmd.filter.enumitem.ItemizeItem(modifiable_content: str, parent: Element, enum_item: str = '*')[source]

Bases: Element

Represents an item in a LaTeX itemize environment.

Example

>>> item = ItemizeItem("First item", None)
>>> print(item.to_string())
•  First item
label_name() str[source]

Returns the label of the item.

Returns:

The label.

Return type:

str

Example

>>> item = ItemizeItem("abc", None)
>>> item.label_name()
'•'
to_string() str[source]

Converts the item to a formatted string.

Returns:

The formatted item string.

Return type:

str

Example

>>> item = ItemizeItem("abc", None)
>>> isinstance(item.to_string(), str)
True
static position(string: str) int[source]

Finds the position of ‘item’ in the string.

Parameters:

string (str) – The input string.

Returns:

The position index.

Return type:

int

Example

>>> ItemizeItem.position("\item abc")
0
static split_and_create(string: str, parent: Element) tuple[source]

Splits the string on ‘item’ and creates an ItemizeItem.

Parameters:
  • string (str) – The input string.

  • parent (Element) – The parent element.

Returns:

(pre, ItemizeItem, post)

Return type:

tuple

Example

>>> pre, item, post = ItemizeItem.split_and_create("\item abc", None)
>>> isinstance(item, ItemizeItem)
True
class pytexmd.filter.enumitem.Enumeration(modifiable_content: str, parent: Element, start, label_part)[source]

Bases: Element

Represents a LaTeX enumerate environment.

Example

>>> enum = Enumeration("content", None, enum_style_arabic, "(", ")")
>>> isinstance(enum.to_string(), str)
True
generate_enum_item() str[source]
to_string() str[source]

Converts the enumerate to a formatted string.

Returns:

The formatted enumerate string.

Return type:

str

Example

>>> enum = Enumeration("abc", None, enum_style_arabic, "(", ")")
>>> isinstance(enum.to_string(), str)
True
static position(string: str) int[source]

Finds the position of ‘begin{enumerate}’ in the string.

Parameters:

string (str) – The input string.

Returns:

The position index.

Return type:

int

Example

>>> Enumeration.position("\begin{enumerate}abc")
0
static split_and_create(string: str, parent: Element) tuple[source]

Splits the string on enumerate environment and creates an Enumeration.

Parameters:
  • string (str) – The input string.

  • parent (Element) – The parent element.

Returns:

(pre, Enumeration, post)

Return type:

tuple

Example

>>> pre, enum, post = Enumeration.split_and_create("\begin{enumerate}abc\end{enumerate}", None)
>>> isinstance(enum, Enumeration)
True
class pytexmd.filter.enumitem.EnumerationItem(modifiable_content: str, parent: Element, enum_item: str = None)[source]

Bases: Element

Represents an item in a LaTeX enumerate environment.

Example

>>> enum_item = EnumerationItem("First", None)
>>> isinstance(enum_item.to_string(), str)
True
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
static position(string: str) int[source]

Finds the position of ‘item’ in the string.

Parameters:

string (str) – The input string.

Returns:

The position index.

Return type:

int

Example

>>> EnumerationItem.position("\item abc")
0
static split_and_create(string: str, parent: Element) tuple[source]

Splits the string on ‘item’ and creates an EnumerationItem.

Parameters:
  • string (str) – The input string.

  • parent (Element) – The parent element.

Returns:

(pre, EnumerationItem, post)

Return type:

tuple

Example

>>> pre, enum_item, post = EnumerationItem.split_and_create("\item abc", None)
>>> isinstance(enum_item, EnumerationItem)
True

Equations

Equation filter classes and utilities for pytexmd.

This module provides classes and functions for parsing and processing LaTeX equations, environments, and math for Markdown/MyST conversion.

pytexmd.filter.equations.apply_latex_protection(string: Element) Element[source]

Expands and protects LaTeX environments and commands in the given element.

Parameters:

string (Element) – The element to process.

Returns:

The processed element.

Return type:

Element

class pytexmd.filter.equations.InlineLatex(modifiable_content: str, parent: Element)[source]

Bases: Element

Represents inline LaTeX math ($…$).

Example

>>> inline = InlineLatex("x^2", None)
>>> isinstance(inline.to_string(), str)
True
static position(string: str) int[source]
static split_and_create(string: str, parent: Element) Tuple[str, InlineLatex, str][source]

Split string and create InlineLatex element.

Parameters:
  • string (str) – Input string.

  • parent (Element) – Parent element.

Returns:

Pre-content, InlineLatex, post-content.

Return type:

Tuple[str, InlineLatex, str]

to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.equations.LatexText(modifiable_content: str, parent: Element)[source]

Bases: Element

Represents LaTeX text command.

Example

>>> text = LatexText("hello", None)
>>> isinstance(text.to_string(), str)
True
static position(string: str) int[source]
static split_and_create(string: str, parent: Element) Tuple[str, LatexText, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.equations.Cases(modifiable_content: str, parent: Element)[source]

Bases: Element

Represents LaTeX cases environment.

Example

>>> cases = Cases("x & y \\ z & w", None)
>>> isinstance(cases.to_string(), str)
True
static position(string: str) int[source]
static split_and_create(string: str, parent: Element) Tuple[str, Cases, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.equations.DoubleDolarLatex(modifiable_content: str, parent: Element)[source]

Bases: Element

Represents display math ($$…$$).

Example

>>> dbl = DoubleDolarLatex("x^2", None)
>>> isinstance(dbl, DoubleDolarLatex)
True
prio_elem = True
add_label(label: str)[source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
static position(string: str) int[source]
static split_and_create(string: str, parent: Element) Tuple[str, Undefined, str][source]
pytexmd.filter.equations.get_all_filters() list[source]

Returns all equation-related filter classes/searchers.

Returns:

List of filter classes/searchers.

Return type:

list

Example

>>> filters = get_all_filters()
>>> isinstance(filters, list)
True

File Maker

pytexmd.filter.file_maker.string_to_tree(string: str) Document[source]

Converts a string to a document tree structure.

Parameters:

string (str) – The input string to process.

Returns:

The processed document tree.

Return type:

Document

Example

`python latex = r"""\section{Intro}\begin{equation}E=mc^2\end{equation}""" doc = string_to_tree(latex) print(doc.to_string()) `

pytexmd.filter.file_maker.process_string(output_folder: str, string: str, depth=2, output_suffix: str = '.md', verify=True)[source]

Processes a LaTeX string and writes the document to hierarchical MyST files.

This function converts LaTeX to a document tree, then splits it into multiple files based on section hierarchy with automatic content verification.

Parameters:
  • output_folder (str) – The output folder path.

  • string (str) – The input LaTeX string.

  • depth (int, optional) – Splitting depth (0=no split, 1=chapter, 2=section, etc.). Defaults to 2.

  • output_suffix (str, optional) – The file suffix. Defaults to “.md”.

  • verify (bool, optional) – Verify content integrity after parsing. Defaults to True.

Returns:

Root structure with child_files tracking for all sections

Return type:

dict

Example

`python # Process a LaTeX string and split into hierarchical files latex = r"""\chapter{Intro}\section{Background}\subsection{Details}""" structure = process_string("output", latex, depth=2) # Creates: output/intro.md with toctree to output/background.md `

pytexmd.filter.file_maker.element_to_file_whole(element: SectionLike, output_folder: str, file_name: str, output_suffix: str = '.md')[source]

Writes the whole element to a file.

Parameters:
  • element (SectionEnumerate) – The element to write.

  • output_folder (str) – The output folder path.

  • file_name (str) – The file name.

  • output_suffix (str, optional) – The file suffix. Defaults to “.md”.

Returns:

None

Example

`python # Save the entire document as 'output/index.md' doc = string_to_tree(r"\section{Intro}") element_to_file_whole(doc, "output", "index") `

pytexmd.filter.file_maker.element_to_file_only_begin(element: SectionLike, output_folder: str, file_name: str, file_names: List[str], output_suffix: str = '.md')[source]

Writes only the beginning part of the element to a file, with a toctree.

Parameters:
  • element (SectionEnumerate) – The element to write.

  • output_folder (str) – The output folder path.

  • file_name (str) – The file name.

  • output_suffix (str, optional) – The file suffix. Defaults to “.md”.

Returns:

None

Example

`python # Save only the introduction and generate a toctree for subsections doc = string_to_tree(r"\section{Intro}\section{Background}") element_to_file_only_begin(doc, "output", "index") `

pytexmd.filter.file_maker.split_document_to_files(document_md, output_folder, depth=2, output_suffix='.md', verify=True)[source]

Main function to split document tree into hierarchical MyST files.

Each section file will know its child files through the structure.

Parameters:
  • document_md – Document tree object (from string_to_tree)

  • output_folder (str) – Output directory path

  • depth (int) – Splitting depth (0=no split, 1=chapter, 2=section, etc.)

  • output_suffix (str) – File extension

  • verify (bool) – Verify content integrity after parsing

Returns:

Root structure with child_files tracking for all sections

Return type:

dict

Example

`python # Convert and split a document doc = string_to_tree(latex_string) structure = split_document_to_files(doc, "./output", depth=2, verify=True) # Each section in structure has 'child_files' list `

pytexmd.filter.file_maker.split_by_sections(content_string, max_depth=2)[source]

Split document string into hierarchical sections based on MyST comment markers.

Parameters:
  • content_string (str) – The full document string with MyST markers

  • max_depth (int) – Maximum depth for splitting (0=part, 1=chapter, 2=section, etc.)

Returns:

Hierarchical structure of sections with content and children tracking

Return type:

dict

pytexmd.filter.file_maker.verify_content_integrity(original_content, structure)[source]

Verify that the split structure contains all original content.

Parameters:
  • original_content (str) – Original document string

  • structure (dict) – Parsed section structure

Returns:

(is_valid, message, stats)

Return type:

tuple

pytexmd.filter.file_maker.string_to_filename(name)[source]

Convert section name to valid filename.

Parameters:

name (str) – Section name to convert

Returns:

Sanitized filename

Return type:

str

Notworking Preprocessor

pytexmd.filter.notworking_preprocessor.do_commands(string: str) str[source]
pytexmd.filter.notworking_preprocessor.do_newenvironment(string: str) str[source]

Processes all LaTeX newenvironment definitions and applies them.

Parameters:

string (str) – The input string.

Returns:

The string with environments expanded.

Return type:

str

Example

>>> s = r"\newenvironment{foo}[2]{<b>#1 #2>}{</b>} \begin{foo}{a}{b}content\end{foo}"
>>> do_newenvironment(s)
'<b>a b>content</b>'

Preprocessor

pytexmd.filter.preprocessor.do_commands(string: str) str[source]

Processes all LaTeX newcommand definitions and applies them.

Parameters:

string (str) – The input string.

Returns:

The string with commands expanded.

Return type:

str

Example

>>> s = r"\newcommand{\foo}[2]{#1+#2} \foo{a}{b}"
>>> do_commands(s)
'a+b'
pytexmd.filter.preprocessor.do_newenvironment(string: str) str[source]

Processes all LaTeX newenvironment definitions and applies them.

Parameters:

string (str) – The input string.

Returns:

The string with environments expanded.

Return type:

str

Example

>>> s = r"\newenvironment{foo}[2]{<b>#1 #2>}{</b>} \begin{foo}{a}{b}content\end{foo}"
>>> do_newenvironment(s)
'<b>a b>content</b>'

Splitting

pytexmd.filter.splitting.get_all_allchars_no_abc() str[source]

Returns a string of non-alphabetic ASCII characters.

Returns:

String containing non-alphabetic ASCII characters.

Return type:

str

Example

>>> chars = get_all_allchars_no_abc()
>>> isinstance(chars, str)
True
pytexmd.filter.splitting.save_command_split(string: str, split_on: str) List[str][source]

Splits a string on a given substring, preserving certain patterns.

Parameters:
  • string (str) – The input string to split.

  • split_on (str) – The substring to split on.

Returns:

List of split string segments.

Return type:

List[str]

Raises:

ValueError – If input types are incorrect.

Example

>>> parts = save_command_split("foo$bar$baz", "$")
>>> parts
['foo', 'bar', 'baz']
pytexmd.filter.splitting.first_char_brace(string: str, begin_brace: str = '{') bool[source]

Checks if the first non-whitespace character of a string is a given brace.

Parameters:
  • string (str) – The input string.

  • begin_brace (str, optional) – The brace character to check. Defaults to “{“.

Returns:

True if first character is the brace, False otherwise.

Return type:

bool

Raises:

ValueError – If input types are incorrect.

Example

>>> is_brace = first_char_brace(" {foo}")
>>> is_brace
True
pytexmd.filter.splitting.split_on_first_brace(string: str, begin_brace='{', end_brace='}', error_replacement='brace_error') Tuple[str, str][source]

Splits a string on the first matching pair of braces.

Parameters:
  • string (str) – The input string.

  • begin_brace (str, optional) – The opening brace. Defaults to “{“.

  • end_brace (str, optional) – The closing brace. Defaults to “}”.

  • error_replacement (str, optional) – Replacement string if brace not found. Defaults to “brace_error”.

Returns:

Content inside braces, and the remaining string.

Return type:

Tuple[str, str]

Raises:

ValueError – If input types are incorrect.

Example

>>> inside, rest = split_on_first_brace("{foo}bar")
>>> inside
'foo'
>>> rest
'bar'
pytexmd.filter.splitting.split_rename(string: str) Tuple[str, str] | None[source]

Splits the input string into a name and the remaining string if the first character is a ‘[‘.

Parameters:

string (str) – The input string.

Returns:

A tuple containing the name and the remaining string, or None if the first character is not ‘[‘.

Return type:

Optional[Tuple[str, str]]

Raises:

ValueError – If input is not a string.

Example

>>> name, rest = split_rename("[foo]bar")
>>> name
'foo'
>>> rest
'bar'
pytexmd.filter.splitting.split_on_next(string: str, split_on: str, save_split: bool = True) Tuple[str, str][source]

Splits a string on the next occurrence of a substring.

Parameters:
  • string (str) – The input string.

  • split_on (str) – The substring to split on.

  • save_split (bool, optional) – Whether to use save_command_split. Defaults to True.

Returns:

The part before and after the split.

Return type:

Tuple[str, str]

Raises:

ValueError – If input types are incorrect.

Example

>>> before, after = split_on_next("foo$bar$baz", "$")
>>> before
'foo'
>>> after
'bar$baz'
pytexmd.filter.splitting.begin_end_split(string: str, begin_name: str, end_name: str, save_split: bool = False) Tuple[str, str, str][source]

Splits a string into three parts: before, between, and after given begin and end substrings.

Parameters:
  • string (str) – The input string.

  • begin_name (str) – The substring marking the beginning.

  • end_name (str) – The substring marking the end.

  • save_split (bool, optional) – Whether to use save_command_split. Defaults to False.

Returns:

The parts before, between, and after the delimiters.

Return type:

Tuple[str, str, str]

Raises:

ValueError – If input types are incorrect.

Example

>>> pre, mid, post = begin_end_split("a\begin{env}b\end{env}c", "\begin{env}", "\end{env}")
>>> pre
'a'
>>> mid
'b'
>>> post
'c'
pytexmd.filter.splitting.position_of(string: str, begin_name: str, save_split: bool = True) int[source]

Finds the position of a substring in a string.

Parameters:
  • string (str) – The input string.

  • begin_name (str) – The substring to find.

  • save_split (bool, optional) – Whether to use save_command_split. Defaults to True.

Returns:

The position index, or -1 if not found.

Return type:

int

Raises:

ValueError – If input types are incorrect.

Example

>>> pos = position_of("foo$bar", "$")
>>> pos
3

Text

Section and theorem filter classes and utilities for pytexmd.

This module provides classes and functions for parsing and processing LaTeX sections, theorems, references, and formatting for Markdown/MyST conversion.

class pytexmd.filter.text.Ref(modifiable_content: str, parent: Element, label_ref: str)[source]

Bases: Element

Element for LaTeX ref reference.

Example

>>> ref = Ref("content", None, "mylabel")
>>> isinstance(ref, Ref)
True
static position(input: str) int[source]
static split_and_create(input: str, parent: Element) Tuple[str, Ref, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.text.EqRef(modifiable_content: str, parent: Element, label_ref: str)[source]

Bases: Element

Element for LaTeX eqref reference.

Example

>>> eqref = EqRef("content", None, "eq1")
>>> isinstance(eqref, EqRef)
True
static position(input: str) int[source]
static split_and_create(input: str, parent: Element) Tuple[str, Ref, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.text.Proof(modifiable_content: str, parent: Element)[source]

Bases: Element

Element for LaTeX proof environment.

Example

>>> proof = Proof("content", None)
>>> isinstance(proof, Proof)
True
static position(input: str) int[source]
static split_and_create(input: str, parent: Element) Tuple[str, Proof, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.text.Textbf(modifiable_content: str, parent: Element)[source]

Bases: Element

Element for LaTeX textbf command.

Example

>>> bold = Textbf("content", None)
>>> isinstance(bold, Textbf)
True
static position(input: str) int[source]
static split_and_create(input: str, parent: Element) Tuple[str, Textbf, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.text.Emph(modifiable_content: str, parent: Element)[source]

Bases: Element

Element for LaTeX emph command.

Example

>>> emph = Emph("content", None)
>>> isinstance(emph, Emph)
True
static position(input: str) int[source]
static split_and_create(input: str, parent: Element) Tuple[str, Emph, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.text.Cite(modifiable_content: str, parent: Element, citations: list[str], rename: str)[source]

Bases: Element

Element for LaTeX cite command.

Example

>>> cite = Cite("content", None, ["ref1", "ref2"])
>>> isinstance(cite, Cite)
True
static position(input: str) int[source]
static split_and_create(input: str, parent: Element) Tuple[str, Cite, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
pytexmd.filter.text.get_all_filters() list[source]

Returns all section-related filter classes/searchers.

Returns:

List of filter classes/searchers.

Return type:

list

Example

>>> filters = get_all_filters()
>>> isinstance(filters, list)
True
pytexmd.filter.text.get_number_within_equation(input: str) str[source]

Extract equation numbering context from LaTeX string.

Parameters:

input (str) – LaTeX string.

Returns:

Numbering context or “document”.

Return type:

str

Example

>>> get_number_within_equation("abc\numberwithin{equation}{section}")
'section'
pytexmd.filter.text.get_theoremSearchers(input: str) list[source]

Extract theorem searchers from LaTeX preamble.

Parameters:

input (str) – LaTeX preamble string.

Returns:

List of TheoremSearcher instances.

Return type:

list

Example

>>> result = get_theoremSearchers(r"\newtheorem{theorem}{Theorem}")
>>> isinstance(result, list)
True
class pytexmd.filter.text.Textit(modifiable_content: str, parent: Element)[source]

Bases: Element

Element for LaTeX textit command.

Example

>>> textit = Textit("content", None)
>>> isinstance(textit, Textit)
True
static position(input: str) int[source]
static split_and_create(input: str, parent: Element) Tuple[str, Emph, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'
class pytexmd.filter.text.ProofLabel(modifiable_content: str, parent: Element, label_ref: str)[source]

Bases: Element

Element for MyST label.

Parameters:
  • modifiable_content (str) – Content to process.

  • parent (Element) – Parent element.

  • label_ref (str) – Label reference.

Example

>>> label = ProofLabel("content", None, "mylabel")
>>> isinstance(label, ProofLabel)
True
static position(string: str) int[source]
static split_and_create(string: str, parent: Element) Tuple[str, ProofLabel, str][source]
to_string() str[source]

Output Markdown/MyST string.

Returns:

Markdown/MyST representation.

Return type:

str

Example

>>> class Dummy(Element):
...     def to_string(self): return "dummy"
>>> Dummy("abc", None).to_string()
'dummy'

Module Contents

pytexmd.filter.string_to_tree(string: str) Document[source]

Converts a string to a document tree structure.

Parameters:

string (str) – The input string to process.

Returns:

The processed document tree.

Return type:

Document

Example

`python latex = r"""\section{Intro}\begin{equation}E=mc^2\end{equation}""" doc = string_to_tree(latex) print(doc.to_string()) `

pytexmd.filter.process_string(output_folder: str, string: str, depth=2, output_suffix: str = '.md', verify=True)[source]

Processes a LaTeX string and writes the document to hierarchical MyST files.

This function converts LaTeX to a document tree, then splits it into multiple files based on section hierarchy with automatic content verification.

Parameters:
  • output_folder (str) – The output folder path.

  • string (str) – The input LaTeX string.

  • depth (int, optional) – Splitting depth (0=no split, 1=chapter, 2=section, etc.). Defaults to 2.

  • output_suffix (str, optional) – The file suffix. Defaults to “.md”.

  • verify (bool, optional) – Verify content integrity after parsing. Defaults to True.

Returns:

Root structure with child_files tracking for all sections

Return type:

dict

Example

`python # Process a LaTeX string and split into hierarchical files latex = r"""\chapter{Intro}\section{Background}\subsection{Details}""" structure = process_string("output", latex, depth=2) # Creates: output/intro.md with toctree to output/background.md `

pytexmd.filter.element_to_file_whole(element: SectionLike, output_folder: str, file_name: str, output_suffix: str = '.md')[source]

Writes the whole element to a file.

Parameters:
  • element (SectionEnumerate) – The element to write.

  • output_folder (str) – The output folder path.

  • file_name (str) – The file name.

  • output_suffix (str, optional) – The file suffix. Defaults to “.md”.

Returns:

None

Example

`python # Save the entire document as 'output/index.md' doc = string_to_tree(r"\section{Intro}") element_to_file_whole(doc, "output", "index") `

pytexmd.filter.element_to_file_only_begin(element: SectionLike, output_folder: str, file_name: str, file_names: List[str], output_suffix: str = '.md')[source]

Writes only the beginning part of the element to a file, with a toctree.

Parameters:
  • element (SectionEnumerate) – The element to write.

  • output_folder (str) – The output folder path.

  • file_name (str) – The file name.

  • output_suffix (str, optional) – The file suffix. Defaults to “.md”.

Returns:

None

Example

`python # Save only the introduction and generate a toctree for subsections doc = string_to_tree(r"\section{Intro}\section{Background}") element_to_file_only_begin(doc, "output", "index") `

pytexmd.filter.split_document_to_files(document_md, output_folder, depth=2, output_suffix='.md', verify=True)[source]

Main function to split document tree into hierarchical MyST files.

Each section file will know its child files through the structure.

Parameters:
  • document_md – Document tree object (from string_to_tree)

  • output_folder (str) – Output directory path

  • depth (int) – Splitting depth (0=no split, 1=chapter, 2=section, etc.)

  • output_suffix (str) – File extension

  • verify (bool) – Verify content integrity after parsing

Returns:

Root structure with child_files tracking for all sections

Return type:

dict

Example

`python # Convert and split a document doc = string_to_tree(latex_string) structure = split_document_to_files(doc, "./output", depth=2, verify=True) # Each section in structure has 'child_files' list `

pytexmd.filter.split_by_sections(content_string, max_depth=2)[source]

Split document string into hierarchical sections based on MyST comment markers.

Parameters:
  • content_string (str) – The full document string with MyST markers

  • max_depth (int) – Maximum depth for splitting (0=part, 1=chapter, 2=section, etc.)

Returns:

Hierarchical structure of sections with content and children tracking

Return type:

dict

pytexmd.filter.verify_content_integrity(original_content, structure)[source]

Verify that the split structure contains all original content.

Parameters:
  • original_content (str) – Original document string

  • structure (dict) – Parsed section structure

Returns:

(is_valid, message, stats)

Return type:

tuple

pytexmd.filter.string_to_filename(name)[source]

Convert section name to valid filename.

Parameters:

name (str) – Section name to convert

Returns:

Sanitized filename

Return type:

str