HAST is the core data structure used throughout this package for syntax highlighting and markdown processing. Understanding HAST helps you work effectively with the docs infra.
HAST (Hypertext Abstract Syntax Tree) is a specification for representing HTML as an abstract syntax tree. In this package, HAST serves as an intermediate format that bridges the gap between raw source code and rendered React components.
Key Characteristics:
Using HAST as an intermediate format provides several critical benefits for documentation infrastructure:
Syntax highlighting is computed once during compilation, not on every page load or request. This means:
No parsing or highlighting libraries are needed in the client bundle or server runtime:
Note
While server-side highlighting (e.g., with React Server Components) can offload work from the client, it still requires processing on every request. Build-time HAST generation eliminates this overhead entirely by computing highlighting once and caching the results.
HAST can be converted to any framework's component format:
hastToJsx()Unlike plain HTML strings, HAST can be converted to actual React components with custom handlers:
This is not possible with plain HTML strings, which can only be rendered via dangerouslySetInnerHTML without component-level control.
HAST is part of the unified ecosystem, which provides a vast collection of plugins for transforming content:
This plugin architecture allows you to build sophisticated content processing pipelines without writing transform logic from scratch. Since HAST is already a structured tree, you can traverse and modify it directly without the overhead of parsing HTML strings or the fragility of regex replacements.
Webpack can cache HAST as JSON between builds:
The typical flow of data through the HAST pipeline:
Source Code
↓
Parse & Format (Build Time)
↓
HAST (Syntax-highlighted AST)
↓
Serialize to JSON
↓
Store in Bundle
↓
Load at Runtime
↓
Convert to React.ReactNode
↓
Render Components
A HAST tree consists of nodes with the following structure:
interface HastRoot {
type: 'root';
children: HastContent[];
}
interface HastElement {
type: 'element';
tagName: string;
properties?: Record<string, any>;
children?: HastContent[];
}
interface HastText {
type: 'text';
value: string;
}
type HastContent = HastElement | HastText;
Here's what a simple HAST structure looks like for highlighted code:
{
"type": "root",
"children": [
{
"type": "element",
"tagName": "code",
"properties": { "className": ["language-ts"] },
"children": [
{
"type": "element",
"tagName": "span",
"properties": { "className": ["token", "keyword"] },
"children": [{ "type": "text", "value": "const" }]
},
{ "type": "text", "value": " " },
{
"type": "element",
"tagName": "span",
"properties": { "className": ["token", "variable"] },
"children": [{ "type": "text", "value": "x" }]
}
]
}
]
}
This represents: const x with syntax highlighting classes applied.
When working with HAST from untrusted sources (e.g., JSON from external APIs or user-generated content), sanitize the HAST tree before converting it to React components:
import { hastToJsx } from '@mui/internal-docs-infra/hastToJsx';
import { sanitize } from 'hast-util-sanitize';
// Untrusted HAST from external source
const untrustedHast = await fetch('/api/user-content').then((r) => r.json());
// Sanitize before rendering
const sanitizedHast = sanitize(untrustedHast);
const safeContent = hastToJsx(sanitizedHast);
The hast-util-sanitize package removes potentially dangerous elements and attributes, preventing XSS attacks.
Note
For HAST generated at build time by this package (from
loadPrecomputedCodeHighlighterorloadPrecomputedTypesMeta), sanitization is not necessary since the content comes from your own source files.
parseSource: Converts source code to HAST with syntax highlighting using Starry NighthastToJsx: Converts HAST nodes to React elementshastOrJsonToJsx: Handles both HAST and serialized JSON formatsHAST transformers follow a naming convention based on which AST they operate on:
Rehype plugins (operate on HAST/HTML AST):
transformHtmlCode: Processes <pre><code> blocks in HAST and precomputes syntax highlighting datatransformHtmlCodeInlineHighlighted: Applies inline syntax highlighting to HASTtransformHtml* functions work with HAST nodesRemark plugins (operate on MDAST/Markdown AST):
transformMarkdownCode: Groups markdown code fences with variants and converts to semantic HTML/HASTtransformMarkdown* functions work with MDAST, often producing HAST as outputReact hooks (consume HAST at runtime):
useTypes: Converts type metadata (HAST) to React nodesHAST is used extensively throughout the docs-infra package:
The CodeHighlighter component works with HAST:
The loadPrecomputedTypesMeta loader:
Using HAST provides measurable performance improvements over both client-side and server-side highlighting:
For a documentation site with 100 code examples:
Client-side highlighting:
Server-side highlighting (per request):
Build-time HAST (this approach):
HAST is part of the unified ecosystem:
Build-time HAST is ideal for:
Note
Build-time HAST can still be enhanced and transformed at server or client render-time without reparsing. Since it's already a structured tree, you can traverse and modify it efficiently for dynamic customization while keeping the expensive syntax highlighting precomputed.
Server-side rendering (RSC) works well for:
Client-side rendering is necessary for:
Consider hybrid approaches:
Tip
The choice isn't binary - you can use different approaches for different types of content in the same application. For example, use build-time HAST for documentation pages while using server-side rendering for user-generated content sections.
You can still customize precomputed HAST at render-time on the server or client without reparsing HTML or using regex.
import { hastToJsx } from '@mui/internal-docs-infra/pipeline/hastUtils';
const components = {
code: (props: any) => <CodeBlock {...props} />, // enhance code blocks
a: (props: any) => <a {...props} rel={props.rel ?? 'noopener noreferrer'} />, // enforce safe links
};
export function RenderHast({ hast }: { hast: any }) {
return <>{hastToJsx(hast, components)}</>;
}
import type { Root, Element } from 'hast';
function addNoopener(hast: Root): Root {
const stack: any[] = [hast];
while (stack.length) {
const node: any = stack.pop();
if (node.type === 'element' && node.tagName === 'a') {
node.properties = {
...(node.properties || {}),
rel: node.properties?.rel ?? 'noopener noreferrer',
};
}
if (node.children) stack.push(...node.children);
}
return hast;
}
// Server or client render-time
const sanitized = sanitize(hast); // if the source is untrusted
const enhanced = addNoopener(sanitized);
const jsx = hastToJsx(enhanced, components);
Do:
hast)Don't: