Skip to content

docs: remove trailing whitespace from README.md#1838

Open
srpatcha wants to merge 4 commits intomicrosoft:mainfrom
srpatcha:docs/fix-readme-whitespace
Open

docs: remove trailing whitespace from README.md#1838
srpatcha wants to merge 4 commits intomicrosoft:mainfrom
srpatcha:docs/fix-readme-whitespace

Conversation

@srpatcha
Copy link
Copy Markdown

Changes

docs: remove trailing whitespace from README.md

Signed-off with GPG.

srpatcha and others added 4 commits April 24, 2026 19:20
Add RTF to Markdown converter with:
- RTF control word parsing and tokenization
- Text extraction from nested RTF groups
- Bold, italic, underline style handling
- Table parsing with Markdown table output
- Unicode escape handling (\\uN)
- Hex escape decoding with charset support
- Font charset mapping
- Skip destination groups (fonttbl, colortbl, etc.)

Signed-off-by: Srikanth Patchava <spatchava@meta.com>
Wrap BeautifulSoup parsing with try/except for UnicodeDecodeError and
LookupError. When the declared charset (or default utf-8) fails, fall
back to letting BeautifulSoup auto-detect the encoding from raw bytes.
This prevents crashes on HTML files with incorrect or missing charset
declarations.

Signed-off-by: Srikanth Patchava <spatchava@meta.com>
Add tests covering:
- Acceptance tests (extension, mimetype matching)
- Conversion: plain text, bold, italic, underline, bold+italic
- Paragraph breaks, Unicode escapes, hex escapes
- Table conversion, empty RTF, skip fonttbl/colortbl groups
- Latin-1 fallback encoding, charset in stream_info
- Nested groups, non-breaking space, tab handling
- Low-level _rtf_to_markdown function tests

Signed-off-by: Srikanth Patchava <spatchava@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant