How to Convert PDF to Markdown: A Complete Step-by-Step Guide

A practical guide to converting PDF files to Markdown format, covering different PDF types, common issues, cleanup techniques, and best practices for documentation, note-taking, and GitHub.

PDF2MD Team
PDF2MD Team
April 3, 2026
How to Convert PDF to Markdown: A Complete Step-by-Step Guide

How to Convert PDF to Markdown: A Complete Step-by-Step Guide

PDF files are everywhere — research papers, technical docs, ebooks, invoices. But when you need to edit, version-control, or repurpose that content, PDF is painful to work with. Markdown is lightweight, human-readable, and works seamlessly with tools like GitHub, Obsidian, and static site generators.

This guide walks you through converting PDF to Markdown effectively, covering the process, edge cases, and cleanup techniques.

Why Convert PDF to Markdown?

  • Version control. Markdown is plain text — you can track changes in Git. PDFs are binary blobs that cannot be meaningfully diffed.
  • Portability. A single .md file works in hundreds of editors and platforms. PDF locks content into a fixed visual layout.
  • Editing efficiency. Copying text from a PDF produces broken formatting. Converting to Markdown gives you a clean starting point.
  • AI workflows. LLMs work with text, not PDFs. Markdown preserves document structure in a format AI models can reason about.

Step-by-Step: Converting with pdf2md.net

pdf2md.net runs entirely in your browser — your files are never uploaded to a server.

  1. Open the tool at pdf2md.net
  2. Upload your PDF — drag and drop or use the file picker
  3. Configure settings — adjust page range, table detection, or OCR mode if needed
  4. Convert — click the convert button and wait for processing
  5. Review — check headings, tables, lists, and code blocks in the preview
  6. Download the .md file or copy to clipboard

Handling Different PDF Types

Text-Based PDFs (Born Digital)

Created from word processors or LaTeX. Text is directly extractable, so conversion quality is high.

Watch out for: Multi-column layouts that confuse reading order, repeating headers/footers, and hyphenated words at line breaks.

Scanned PDFs (Image-Based)

Each page is a photograph with no text layer. Quality depends entirely on OCR accuracy.

Tips: Use 300+ DPI scans, ensure documents aren't skewed, and expect to manually review OCR output. Common errors include confusing l with 1 and O with 0.

Academic Papers

Section headings and body text convert well. Mathematical equations, footnotes, and multi-column layouts typically break. For math-heavy papers, consider specialized tools like Mathpix first.

Tables

Tables are the hardest element to convert. PDF has no concept of a "table" — it's just text at specific coordinates. Simple grid tables convert reasonably well; merged cells and complex headers almost always need manual repair.

Common Issues and Fixes

Garbled Text

Character encoding mismatches produce artifacts like ’ instead of '. Try a different converter, or manually fix common substitutions.

Line Break Problems

PDF stores text with explicit line positions, so every line may become separate in Markdown. Use regex find-and-replace to merge paragraph lines: find ([^\n])\n([^\n#\-\*\|>]) and replace with $1 $2.

Lost Formatting

Bold, italic, and heading detection depends on the converter reading font metadata. If headings come through as plain text, add # markers manually based on the original document structure.

Images

Some converters extract images as separate files with ![alt](path) references. Others skip images entirely. For important images, extract them with tools like pdfimages and insert references manually.

Cleanup Checklist

  1. Structure — ensure one H1 title, logical heading hierarchy, remove repeated headers/footers
  2. Paragraphs — merge broken lines, verify intentional line breaks are preserved
  3. Tables — fix pipe alignment, add separator rows, ensure consistent column counts
  4. Lists — use consistent markers (- or *), check nested indentation
  5. Links — convert bare URLs to [text](url) format
  6. Preview — compare against the original PDF in a Markdown viewer
  7. Lint — run npx markdownlint-cli article.md to catch syntax issues

Frequently Asked Questions

Is the conversion lossless? No. PDF stores visual layout; Markdown stores semantic structure. Some information is inevitably lost — particularly precise formatting, custom fonts, and complex layouts.

Can I convert password-protected PDFs? Unlock the PDF first with a tool like qpdf --password=yourpass --decrypt protected.pdf unlocked.pdf, then convert.

Can I automate batch conversion? Yes. Command-line tools like Marker (pip install marker-pdf) or PyMuPDF can process many files programmatically.

Wrapping Up

Converting PDF to Markdown is rarely one-click. Quality depends on the input PDF, the converter's capabilities, and your willingness to clean up the output. For simple text-heavy PDFs, tools like pdf2md.net produce clean output with minimal editing. For complex documents, plan to spend time on post-conversion cleanup.

Treat converter output as a first draft, not a finished product. The time you invest in cleanup pays off in a clean, portable, version-controllable document.