Customizing MDEx Output with AST Transformation

TL;DR

MDEx’s default HTML output won’t match your site’s design system. Use traverse_and_update/2 to transform the AST before rendering - add custom classes, skip elements, or replace nodes entirely.

The Problem

You’re rendering markdown in Phoenix. MDEx handles parsing and syntax highlighting beautifully, but the output HTML uses generic tags:

<h2>My Heading</h2>

Your site needs:

<h2 id="my-heading" class="blog-article__h2">My Heading</h2>

You could post-process the HTML string with regex. Don’t. There’s a better way.

The Solution

MDEx exposes the parsed AST before rendering. Transform it, then render.

# Parse markdown to AST
doc = MDEx.parse_document!(markdown, extension: [alerts: true])
# Transform the AST
doc = MDEx.traverse_and_update(doc, fn node -> transform(node) end)
# Render to HTML
{:ok, html} = MDEx.to_html(doc, render: [unsafe_: true])

Transforming Headings

Each node in the AST is a struct. Headings are %MDEx.Heading{} with a level and child nodes.

Here’s how to add custom classes and IDs:

defp render_markdown(markdown) do
doc = MDEx.parse_document!(markdown,
extension: [alerts: true, strikethrough: true, table: true]
)
doc = MDEx.traverse_and_update(doc, fn
%MDEx.Heading{level: level, nodes: nodes} ->
text = extract_text(nodes)
id = MDEx.anchorize(text)
class = heading_class(level)
escaped = text |> Phoenix.HTML.html_escape() |> Phoenix.HTML.safe_to_string()
%MDEx.HtmlBlock{
literal: ~s(<h#{level} id="#{id}" class="#{class}">#{escaped}</h#{level}>)
}
node -> node
end)
{:ok, html} = MDEx.to_html(doc, render: [unsafe_: true])
html
end
defp heading_class(2), do: "blog-article__h2"
defp heading_class(3), do: "blog-article__h3"
defp heading_class(_), do: "blog-article__h4"

The key insight: replace the Heading node with an HtmlBlock containing your custom HTML. MDEx renders HtmlBlock nodes as raw HTML.

Extracting Text from Nodes

Heading nodes contain nested children - text, code, emphasis, etc. Extract the plain text recursively:

defp extract_text(nodes) when is_list(nodes) do
Enum.map_join(nodes, "", &extract_text/1)
end
defp extract_text(%MDEx.Text{literal: text}), do: text
defp extract_text(%MDEx.Code{literal: text}), do: "`#{text}`"
defp extract_text(%{nodes: nodes}), do: extract_text(nodes)
defp extract_text(_), do: ""

Skipping Elements

What if your page already shows an H1 title in the header? Skip the markdown’s H1:

MDEx.traverse_and_update(doc, fn
%MDEx.Heading{level: 1} ->
%MDEx.HtmlBlock{literal: ""} # Empty string = skip
node -> node
end)

Return an empty HtmlBlock to effectively remove the node from output.

Why Not Post-Process HTML?

You could regex the output:

html
|> String.replace(~r/<h2>(.+?)<\/h2>/, "<h2 class=\"fancy\">\\1</h2>")

Problems:

  • Fragile - breaks on nested tags or attributes
  • Can’t access structured data (heading level, text content)
  • Gets ugly fast with multiple transformations

AST transformation is explicit, composable, and type-safe.

Common Node Types

MDEx exposes these structs (among others):

Struct Purpose
MDEx.Heading Headers with level and nodes
MDEx.Paragraph Paragraph containers
MDEx.Text Plain text with literal
MDEx.Code Inline code with literal
MDEx.CodeBlock Fenced code blocks
MDEx.HtmlBlock Raw HTML passthrough
MDEx.Link Links with url and title

Pattern match on what you need, pass through the rest.

Combining with Extensions

MDEx extensions like alerts: true create their own node types. They work alongside your transformations:

doc = MDEx.parse_document!(markdown,
extension: [
alerts: true, # GitHub-style callouts
strikethrough: true,
table: true,
autolink: true
],
parse: [smart: true] # Smart quotes
)

Your transformation function only handles nodes you explicitly match. Everything else passes through unchanged.

References