Skip to content

fix: escape square brackets in link text and image alt#1822

Open
Brumbelow wants to merge 1 commit intomicrosoft:mainfrom
Brumbelow:fix/link-text-bracket-escaping
Open

fix: escape square brackets in link text and image alt#1822
Brumbelow wants to merge 1 commit intomicrosoft:mainfrom
Brumbelow:fix/link-text-bracket-escaping

Conversation

@Brumbelow
Copy link
Copy Markdown

Summary

Link text and image alt containing [ / ] were emitted unescaped inside [...](...) / ![...](...), producing nested-bracket output that violates CommonMark.

Problem

Repro on main:

from io import BytesIO
from markitdown import MarkItDown
html = b'<html><body>See <a href="https://x.com/">Learn [GPT]</a></body></html>'
print(MarkItDown().convert_stream(BytesIO(html), file_extension='.html').text_content)
# => 'See [Learn [GPT]](https://x.com/)'
  • GitHub's renderer truncates the link text at the first inner ]; strict parsers (markdown-it, pandoc) raise syntax errors.

Fix

  • Escape [ and ] in the two _CustomMarkdownify overrides that build Markdown link syntax (convert_a and convert_img). Keeping the escape local avoids flipping markdownify's escape_misc=True, which would also touch ] \ & < ` [ > ~ = + | and affect other converters' output.

Changes

  • packages/markitdown/src/markitdown/converters/_markdownify.py: escape brackets in link text (convert_a) and image alt (convert_img).
  • packages/markitdown/tests/test_module_misc.py: new test_html_link_text_bracket_escaping covering both cases.

Testing

  • Vector suite green.
  • black (pinned 23.7.0 per .pre-commit-config.yaml) clean.

Fixes #1302

@Brumbelow Brumbelow changed the title fix: escape square brackets in link text and image alt (#1302) fix: escape square brackets in link text and image alt Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

​​Bug: Failure to Escape Square Brackets [/] in Link Text Causes Markdown Parsing Errors​​

1 participant