feat: add disabled_converters parameter, converters property, and CLI flags (closes #1665)#1813
Open
RhythrosaLabs wants to merge 1 commit intomicrosoft:mainfrom
Open
feat: add disabled_converters parameter, converters property, and CLI flags (closes #1665)#1813RhythrosaLabs wants to merge 1 commit intomicrosoft:mainfrom
RhythrosaLabs wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Closes microsoft#1665 Resolves two friction points raised in issue microsoft#1665: 1. disabled_converters -- a new keyword-only parameter on MarkItDown.__init__() that accepts any iterable of DocumentConverter subclasses. Each listed type is skipped during enable_builtins(), so it is never registered and therefore never invoked. This enables: * Security hardening (e.g. disabling ZipConverter on a service that handles untrusted uploads, or AudioConverter to avoid implicit Whisper API network calls). * Leaner deployments where optional extras are not installed and the converter would raise MissingDependencyException anyway. Example usage: from markitdown import MarkItDown from markitdown.converters import ZipConverter, AudioConverter md = MarkItDown(disabled_converters=[ZipConverter, AudioConverter]) 2. converters property -- a read-only tuple snapshot of all currently registered ConverterRegistration objects, sorted by priority, so callers can inspect what is active without touching the private _converters list. 3. disabled_converters property -- mirrors the frozenset passed at construction time for easy introspection / logging. 4. CLI: --disable-converter / --list-converters -- command-line equivalents for shell scripts and container entry-points. Unknown names exit non-zero with a clear message. 5. ConverterRegistration is now exported from the top-level markitdown package so users can type-annotate against it. Implementation notes: * Validation is eager: TypeError raised at construction if any element of disabled_converters is not a DocumentConverter subclass. * The frozenset is immutable, preventing accidental post-construction mutation. * enable_builtins() respects _disabled_converter_types, so the MarkItDown(enable_builtins=False) + later enable_builtins() pattern also honours the disabled set. * No changes to existing converter internals; fully backward-compatible (disabled_converters defaults to None / empty frozenset). * 22 new unit + integration tests added in test_disabled_converters.py.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #1665
Resolves the two friction points raised in #1665 — no way to disable specific built-in converters without subclassing — with a clean, backward-compatible API addition.
Problem
When
MarkItDown()is constructed, all built-in converters are registered unconditionally. There is currently no public surface to:ZipConverterfor zip-bomb protection,AudioConverterto prevent implicit Whisper API network calls on untrusted input).MissingDependencyException._converterslist.Changes
packages/markitdown/src/markitdown/_markitdown.pyMarkItDown.__init__(disabled_converters=…)DocumentConvertersubclasses. Each listed type is skipped duringenable_builtins(). Validation is eager: raisesTypeErrorif any element is not aDocumentConvertersubclass.MarkItDown.disabled_converterspropertyfrozensetof types that were excluded at construction time.MarkItDown.converterspropertytuple[ConverterRegistration, …]snapshot sorted by priority — no more reaching into_converters.packages/markitdown/src/markitdown/__main__.py--disable-converter CONVERTERmarkitdown.converters. Unknown names exit non-zero with a clear message.--list-converterspackages/markitdown/src/markitdown/__init__.pyConverterRegistrationis now exported from the top-level package so callers can type-annotate against it.packages/markitdown/tests/test_disabled_converters.py(new file)22 tests covering:
disabled_convertersandconverterspropertiesconvertersenable_builtins=False+ manualenable_builtins()pattern--list-converters,--disable-converter, unknown name detection, multi-flagUsage examples
Python API
CLI
Before / After
MarkItDown, overrideenable_builtins()MarkItDown(disabled_converters=[ZipConverter])md._converters(private)md.converters(public, sorted tuple)md.disabled_converters(frozenset)--disable-converter ZipConverter--list-convertersChecklist
disabled_convertersdefaults toNone/ empty frozenset; all existing code works unchangedenable_builtins=False+ laterenable_builtins()pattern also respects the disabled setTypeErrortest_speech_transcriptionfailure is unrelated, requires local Whisper setup)