How to Convert PDF to Markdown: A Complete Step-by-Step Guide
A practical guide to converting PDF files to Markdown format, covering different PDF types, common issues, cleanup techniques, and best practices for documentation, note-taking, and GitHub.


How to Convert PDF to Markdown: A Complete Step-by-Step Guide
PDF files are everywhere — research papers, technical docs, ebooks, invoices. But when you need to edit, version-control, or repurpose that content, PDF is painful to work with. Markdown is lightweight, human-readable, and works seamlessly with tools like GitHub, Obsidian, and static site generators.
This guide walks you through converting PDF to Markdown effectively, covering the process, edge cases, and cleanup techniques.
Why Convert PDF to Markdown?
- Version control. Markdown is plain text — you can track changes in Git. PDFs are binary blobs that cannot be meaningfully diffed.
- Portability. A single
.mdfile works in hundreds of editors and platforms. PDF locks content into a fixed visual layout. - Editing efficiency. Copying text from a PDF produces broken formatting. Converting to Markdown gives you a clean starting point.
- AI workflows. LLMs work with text, not PDFs. Markdown preserves document structure in a format AI models can reason about.
Step-by-Step: Converting with pdf2md.net
pdf2md.net runs entirely in your browser — your files are never uploaded to a server.
- Open the tool at pdf2md.net
- Upload your PDF — drag and drop or use the file picker
- Configure settings — adjust page range, table detection, or OCR mode if needed
- Convert — click the convert button and wait for processing
- Review — check headings, tables, lists, and code blocks in the preview
- Download the
.mdfile or copy to clipboard
Handling Different PDF Types
Text-Based PDFs (Born Digital)
Created from word processors or LaTeX. Text is directly extractable, so conversion quality is high.
Watch out for: Multi-column layouts that confuse reading order, repeating headers/footers, and hyphenated words at line breaks.
Scanned PDFs (Image-Based)
Each page is a photograph with no text layer. Quality depends entirely on OCR accuracy.
Tips: Use 300+ DPI scans, ensure documents aren't skewed, and expect to manually review OCR output. Common errors include confusing l with 1 and O with 0.
Academic Papers
Section headings and body text convert well. Mathematical equations, footnotes, and multi-column layouts typically break. For math-heavy papers, consider specialized tools like Mathpix first.
Tables
Tables are the hardest element to convert. PDF has no concept of a "table" — it's just text at specific coordinates. Simple grid tables convert reasonably well; merged cells and complex headers almost always need manual repair.
Common Issues and Fixes
Garbled Text
Character encoding mismatches produce artifacts like ’ instead of '. Try a different converter, or manually fix common substitutions.
Line Break Problems
PDF stores text with explicit line positions, so every line may become separate in Markdown. Use regex find-and-replace to merge paragraph lines: find ([^\n])\n([^\n#\-\*\|>]) and replace with $1 $2.
Lost Formatting
Bold, italic, and heading detection depends on the converter reading font metadata. If headings come through as plain text, add # markers manually based on the original document structure.
Images
Some converters extract images as separate files with  references. Others skip images entirely. For important images, extract them with tools like pdfimages and insert references manually.
Cleanup Checklist
- Structure — ensure one H1 title, logical heading hierarchy, remove repeated headers/footers
- Paragraphs — merge broken lines, verify intentional line breaks are preserved
- Tables — fix pipe alignment, add separator rows, ensure consistent column counts
- Lists — use consistent markers (
-or*), check nested indentation - Links — convert bare URLs to
[text](url)format - Preview — compare against the original PDF in a Markdown viewer
- Lint — run
npx markdownlint-cli article.mdto catch syntax issues
Frequently Asked Questions
Is the conversion lossless? No. PDF stores visual layout; Markdown stores semantic structure. Some information is inevitably lost — particularly precise formatting, custom fonts, and complex layouts.
Can I convert password-protected PDFs? Unlock the PDF first with a tool like qpdf --password=yourpass --decrypt protected.pdf unlocked.pdf, then convert.
Can I automate batch conversion? Yes. Command-line tools like Marker (pip install marker-pdf) or PyMuPDF can process many files programmatically.
Wrapping Up
Converting PDF to Markdown is rarely one-click. Quality depends on the input PDF, the converter's capabilities, and your willingness to clean up the output. For simple text-heavy PDFs, tools like pdf2md.net produce clean output with minimal editing. For complex documents, plan to spend time on post-conversion cleanup.
Treat converter output as a first draft, not a finished product. The time you invest in cleanup pays off in a clean, portable, version-controllable document.