Customizing MDEx Output with AST Transformation

TL;DR

MDEx’s default HTML output won’t match your site’s design system. Use traverse_and_update/2 to transform the AST before rendering - add custom classes, skip elements, or replace nodes entirely.

The Problem

You’re rendering markdown in Phoenix. MDEx handles parsing and syntax highlighting beautifully, but the output HTML uses generic tags:

<h2>My Heading</h2>

Your site needs:

<h2 id="my-heading" class="blog-article__h2">My Heading</h2>

You could post-process the HTML string with regex. Don’t. There’s a better way.

The Solution

MDEx exposes the parsed AST before rendering. Transform it, then render.

# Parse markdown to AST
doc = MDEx.parse_document!(markdown, extension: [alerts: true])

# Transform the AST
doc = MDEx.traverse_and_update(doc, fn node -> transform(node) end)

# Render to HTML
{:ok, html} = MDEx.to_html(doc, render: [unsafe_: true])

Transforming Headings

Each node in the AST is a struct. Headings are %MDEx.Heading{} with a level and child nodes.

Here’s how to add custom classes and IDs:

defp render_markdown(markdown) do
  doc = MDEx.parse_document!(markdown,
    extension: [alerts: true, strikethrough: true, table: true]
  )

  doc = MDEx.traverse_and_update(doc, fn
    %MDEx.Heading{level: level, nodes: nodes} ->
      text = extract_text(nodes)
      id = MDEx.anchorize(text)
      class = heading_class(level)
      escaped = text |> Phoenix.HTML.html_escape() |> Phoenix.HTML.safe_to_string()

      %MDEx.HtmlBlock{
        literal: ~s(<h#{level} id="#{id}" class="#{class}">#{escaped}</h#{level}>)
      }

    node -> node
  end)

  {:ok, html} = MDEx.to_html(doc, render: [unsafe_: true])
  html
end

defp heading_class(2), do: "blog-article__h2"
defp heading_class(3), do: "blog-article__h3"
defp heading_class(_), do: "blog-article__h4"

The key insight: replace the Heading node with an HtmlBlock containing your custom HTML. MDEx renders HtmlBlock nodes as raw HTML.

Extracting Text from Nodes

Heading nodes contain nested children - text, code, emphasis, etc. Extract the plain text recursively:

defp extract_text(nodes) when is_list(nodes) do
  Enum.map_join(nodes, "", &extract_text/1)
end

defp extract_text(%MDEx.Text{literal: text}), do: text
defp extract_text(%MDEx.Code{literal: text}), do: "`#{text}`"
defp extract_text(%{nodes: nodes}), do: extract_text(nodes)
defp extract_text(_), do: ""

Skipping Elements

What if your page already shows an H1 title in the header? Skip the markdown’s H1:

MDEx.traverse_and_update(doc, fn
  %MDEx.Heading{level: 1} ->
    %MDEx.HtmlBlock{literal: ""}  # Empty string = skip

  node -> node
end)

Return an empty HtmlBlock to effectively remove the node from output.

Why Not Post-Process HTML?

You could regex the output:

html
|> String.replace(~r/<h2>(.+?)<\/h2>/, "<h2 class=\"fancy\">\\1</h2>")

Problems:

Fragile - breaks on nested tags or attributes
Can’t access structured data (heading level, text content)
Gets ugly fast with multiple transformations

AST transformation is explicit, composable, and type-safe.

Common Node Types

MDEx exposes these structs (among others):

Struct	Purpose
`MDEx.Heading`	Headers with `level` and `nodes`
`MDEx.Paragraph`	Paragraph containers
`MDEx.Text`	Plain text with `literal`
`MDEx.Code`	Inline code with `literal`
`MDEx.CodeBlock`	Fenced code blocks
`MDEx.HtmlBlock`	Raw HTML passthrough
`MDEx.Link`	Links with `url` and `title`

Pattern match on what you need, pass through the rest.

Combining with Extensions

MDEx extensions like alerts: true create their own node types. They work alongside your transformations:

doc = MDEx.parse_document!(markdown,
  extension: [
    alerts: true,      # GitHub-style callouts
    strikethrough: true,
    table: true,
    autolink: true
  ],
  parse: [smart: true]  # Smart quotes
)

Your transformation function only handles nodes you explicitly match. Everything else passes through unchanged.