Skip to content

docs: correct image converter description in README (EXIF + LLM, not OCR)#1837

Open
sjhddh wants to merge 1 commit intomicrosoft:mainfrom
sjhddh:fix/readme-image-ocr-wording
Open

docs: correct image converter description in README (EXIF + LLM, not OCR)#1837
sjhddh wants to merge 1 commit intomicrosoft:mainfrom
sjhddh:fix/readme-image-ocr-wording

Conversation

@sjhddh
Copy link
Copy Markdown

@sjhddh sjhddh commented Apr 24, 2026

What

The supported-formats list in README.md claims:

  • Images (EXIF metadata and OCR)

But the built-in ImageConverter does not perform OCR. Per the converter's own docstring:

Converts images to markdown via extraction of metadata (if exiftool is installed), and description via a multimodal LLM (if an llm_client is configured).

OCR is only available through the separate Azure Document Intelligence converter ([az-doc-intel] optional dependency), which is already documented in its own section of the README.

Why

This one-word misstatement has caused recurring user confusion. Recent examples:

Users install markitdown[all], feed a JPEG, and expect OCR output — but what they get is EXIF-only (no LLM client configured) or an LLM-generated description (not OCR).

Change

One line in README.md:

- - Images (EXIF metadata and OCR)
+ - Images (EXIF metadata and LLM-based description)

This matches ImageConverter's own docstring and the pattern used elsewhere in the list (e.g. "Audio (EXIF metadata and speech transcription)" — conditional, documented).

Related

Aware of #1608 — overlapping intent (clarify OCR availability), but that PR points users at a markitdown-ocr plugin that is not in this repo. This change is minimal and factual: describe what the in-tree converter actually does. Happy to defer if maintainers prefer #1608's framing, or to fold this into a broader docs rewrite.

…OCR)

The supported-formats list claims "Images (EXIF metadata and OCR)",
but the built-in `ImageConverter` does not perform OCR. Per the
converter's own docstring, it extracts EXIF metadata (when exiftool
is available) and generates a description via a multimodal LLM when
an `llm_client` is configured. OCR is only available through the
separate Azure Document Intelligence converter (`[az-doc-intel]`
optional dependency), which is documented elsewhere in the README.

This mislabeling has caused recurring user confusion, visible in
issues microsoft#1601, microsoft#1344, microsoft#1170, and microsoft#255 where users expected OCR
to work out of the box on images and scanned PDFs.

The one-word change brings the README in line with the actual
behavior of `ImageConverter`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant