A remark plugin that extracts metadata from MDX files and optionally updates parent directory index pages. This plugin automatically collects page titles, descriptions, keywords, and hierarchical section structures to create searchable, navigable documentation indexes.
Use this plugin to extract comprehensive metadata from your MDX documentation pages. It parses both ES module metadata exports (export const metadata = {...}) and document content (headings, descriptions) to build structured metadata that powers navigation, search, and content discovery.
The plugin automatically merges extracted metadata into your page's export const metadata object:
page.mdx with an auto-generated indexNote: The plugin modifies the AST during build time to add or update the
export const metadataobject in your MDX files. User-defined metadata fields are never overwritten—the plugin only fills in missing values.
export const metadata objectimport { unified } from 'unified';
import remarkParse from 'remark-parse';
import transformMarkdownMetadata from '@mui/internal-docs-infra/pipeline/transformMarkdownMetadata';
const processor = unified()
.use(remarkParse)
.use(transformMarkdownMetadata, { extractToIndex: true });
When using the plugin directly (without withDocsInfra), you'll need to provide the baseDir for path filtering to work correctly:
// next.config.js
import transformMarkdownMetadata from '@mui/internal-docs-infra/pipeline/transformMarkdownMetadata';
import { fileURLToPath } from 'node:url';
import { dirname } from 'node:path';
const currentDirname = dirname(fileURLToPath(import.meta.url));
const withMDX = require('@next/mdx')({
options: {
remarkPlugins: [
[
transformMarkdownMetadata,
{
extractToIndex: {
include: ['app'],
exclude: [],
baseDir: currentDirname, // Required for path filtering with absolute paths
},
},
],
],
},
});
module.exports = withMDX({
// your Next.js config
});
Note: The
baseDiris needed because Next.js provides absolute file paths to remark plugins. The plugin strips this prefix to match against your include/exclude patterns. Index files (e.g.,app/page.mdx,app/components/page.mdx) are automatically excluded.
The withDocsInfra Next.js plugin automatically includes this with the correct configuration:
// next.config.js
import createMDX from '@next/mdx';
import { withDocsInfra, getDocsInfraMdxOptions } from '@mui/internal-docs-infra/withDocsInfra';
// Create MDX with docs-infra configuration
const withMDX = createMDX({
options: getDocsInfraMdxOptions({
// Automatically includes extractToIndex with default filter
// { include: ['app', 'src/app'], exclude: [], baseDir: process.cwd() }
// Index files are automatically excluded
}),
});
const nextConfig = {
// Your custom configuration
};
export default withDocsInfra()(withMDX(nextConfig));
// Or disable index generation
const withMDX = createMDX({
options: getDocsInfraMdxOptions({
extractToIndex: false,
}),
});
export default withDocsInfra()(withMDX(nextConfig));
// Or customize path filters
const withMDX = createMDX({
options: getDocsInfraMdxOptions({
extractToIndex: {
include: ['app/docs', 'app/api'],
exclude: ['app/docs/internal'],
},
}),
});
export default withDocsInfra()(withMDX(nextConfig));
Type: string
Default: undefined
A suffix to append to the title in the exported metadata object. This is useful for adding site-wide title suffixes like " | My Site" to page metadata for SEO.
The suffix is only applied to the export const metadata title—it does not affect:
// Adds " | Base UI" to all page titles in the metadata export
.use(transformMarkdownMetadata, { titleSuffix: ' | Base UI' })
// Input: # Button Component
// Result: export const metadata = { title: "Button Component | Base UI", ... }
Type: boolean | { include: string[], exclude: string[], baseDir?: string }
Default: false
Controls automatic extraction of page metadata to parent directory index files.
When enabled, the plugin extracts metadata (title, description, headings) from MDX files and maintains an index in the parent directory's page.mdx file.
Index files themselves (e.g., pattern/page.mdx where pattern is in the include list) are automatically excluded from extraction.
Options:
false - Disabled (no index updates)true - Enabled with default filter: { include: ['app', 'src/app'], exclude: [], baseDir: process.cwd() }{ include: string[], exclude: string[], baseDir?: string } - Enabled with custom path filtersPath matching uses prefix matching - a file matches if its path starts with any include pattern and doesn't start with any exclude pattern. Files matching pattern/page.mdx (where pattern is in the include list) are automatically skipped as they are index files themselves.
Important: Patterns should not include trailing slashes. The plugin automatically appends
/during matching. Use'app'not'app/', and'src/app'not'src/app/'.
Fields:
include (string[]): Path prefixes that files must start with to be indexed (without trailing slashes)exclude (string[]): Path prefixes to exclude from indexing (without trailing slashes)baseDir (string, optional): Base directory to strip from absolute file paths before matching. When using getDocsInfraMdxOptions(), this defaults to process.cwd(). When calling the plugin directly, you should provide this for accurate path filtering.useVisibleDescription (boolean, optional): When true, uses the first visible paragraph after the H1 as the description in the extracted index, even when a <meta> tag provides the SEO description. This is useful when you want different descriptions for SEO (meta tag) vs. the index page (visible content). Default: false.indexWrapperComponent (string, optional): Name of a React component to wrap around the autogenerated index content. When provided, the generated markdown will wrap the page list and detail sections in this component (e.g., <PagesIndex>...</PagesIndex>). This is useful for applying consistent styling or behavior to index pages.// Extract but don't update index
.use(transformMarkdownMetadata)
// Extract and update parent index with default filter
.use(transformMarkdownMetadata, { extractToIndex: true })
// Custom path filter (when using directly, provide baseDir for accurate matching)
.use(transformMarkdownMetadata, {
extractToIndex: {
include: ['app/docs', 'app/components'],
exclude: ['app/docs/internal'],
baseDir: '/path/to/your/project' // e.g., dirname(fileURLToPath(import.meta.url))
}
})
// Use visible paragraph for index, meta tag for SEO
.use(transformMarkdownMetadata, {
extractToIndex: {
include: ['app/components'],
useVisibleDescription: true
}
})
// Wrap index content in a custom component
.use(transformMarkdownMetadata, {
extractToIndex: {
include: ['app/components'],
indexWrapperComponent: 'PagesIndex'
}
})
Default Filter Rationale:
The default { include: ['app', 'src/app'], exclude: [] } is designed for Next.js App Router projects:
app and src/app: Processes pages in both common Next.js directory structuresapp/page.mdx, app/components/page.mdx are automatically skipped to prevent them from creating parent indexes(shared) are automatically removed when matching, so app/(shared)/page.mdx is treated as app/page.mdxThis ensures index pages are created at every level without unwanted parent indexes or interference with your site structure.
When indexWrapperComponent is configured, the plugin automatically injects SitemapSectionData as a data prop into the wrapper component when processing autogenerated index files. This enables wrapper components to receive structured data for dynamic rendering, search, or navigation features.
The plugin:
data propThe injected data structure:
interface SitemapSectionData {
title: string; // From the H1 heading
prefix: string; // URL prefix derived from file path
pages: SitemapPage[]; // Array of page metadata
}
interface SitemapPage {
title?: string;
slug: string;
path: string;
description?: string;
keywords?: string[];
sections?: Record<string, SitemapSection>;
parts?: Record<string, SitemapPart>;
exports?: Record<string, SitemapExport>;
tags?: string[];
skipDetailSection?: boolean;
image?: {
url: string;
alt?: string;
};
}
Prefix Computation:
The prefix field is derived from the file path using the baseDir from extractToIndex:
baseDir if providedsrc and app directories(public))For example, /project/docs/app/(public)/components/page.mdx with baseDir: '/project/docs' becomes prefix /components/.
Use Case:
// PagesIndex.tsx
import type { SitemapSectionData } from '@mui/internal-docs-infra/createSitemap/types';
export function PagesIndex({
children,
data,
}: {
children: React.ReactNode;
data?: SitemapSectionData;
}) {
// Use data for navigation, search, or custom rendering
return (
<div className="pages-index">
{data && <SearchableList pages={data.pages} />}
{children}
</div>
);
}
The simplest usage—write natural markdown and let the plugin extract metadata automatically:
Input MDX:
# Button Component
A versatile button component with multiple variants and sizes.
## Installation
Install the package using your preferred package manager.
## Usage
Import and use the button in your React components.
Extracted Metadata:
{
"title": "Button Component",
"description": "A versatile button component with multiple variants and sizes.",
"sections": {
"installation": {
"title": "Installation",
"titleMarkdown": [{ "type": "text", "value": "Installation" }],
"children": {}
},
"usage": {
"title": "Usage",
"titleMarkdown": [{ "type": "text", "value": "Usage" }],
"children": {}
}
}
}
This is the recommended pattern—clean, readable markdown with automatic metadata extraction.
Write natural markdown and let the plugin extract metadata automatically:
# Button Component
A versatile button component with multiple variants and sizes.
## Installation
Install the package using your preferred package manager.
## Usage
Import and use the button in your React components.
export const metadata = {
keywords: ['button', 'interactive', 'form'],
};
;
Benefits:
When you need SEO-specific metadata or keywords, use export const metadata at the end of the file:
# CodeHighlighter
The CodeHighlighter component provides syntax highlighting.
<!-- Page content continues... -->
export const metadata = {
description: 'Override the first paragraph for SEO purposes',
keywords: ['syntax', 'highlighting', 'code', 'react', 'component'],
};
;
Why export metadata is preferred:
The plugin also supports <meta> or <Meta> tags anywhere in the document for migration scenarios, though inline tags lack type safety and clutter the markdown source.
Next.js automatically inherits metadata.title and metadata.description for Open Graph, so you only need to specify openGraph when adding images:
# CodeHighlighter
The CodeHighlighter component provides syntax highlighting.
<!-- Page content -->
export const metadata = {
keywords: ['syntax', 'highlighting', 'code', 'react'],
openGraph: {
images: [
{
url: '/og-code-highlighter.png',
width: 1200,
height: 630,
alt: 'Code Highlighter Preview',
},
],
},
};
;
Best Practice: Place metadata exports at the end of the file. The first H1 and paragraph are for human readers—they provide all the context needed when reading the markdown source. Metadata exports are for computers (search engines, social media, tooling) and should be unobtrusive.
Note: This uses MDX's ES module syntax (
export const), not traditional YAML frontmatter.
Use extracted sections for automatic navigation:
// Example component using extracted metadata
import { metadata } from './page.mdx';
export function TableOfContents() {
return (
<nav>
{Object.entries(metadata.sections || {}).map(([slug, section]) => (
<a key={slug} href={`#${slug}`}>
{section.title}
</a>
))}
</nav>
);
}
page.mdx for each routeexport const metadata at the end of the fileWhen extractToIndex: true is enabled, the plugin automatically maintains index pages:
app/components/
├── page.mdx # Auto-generated index
├── button/
│ └── page.mdx # Button component docs
├── checkbox/
│ └── page.mdx # Checkbox component docs
└── input/
└── page.mdx # Input component docs
The parent page.mdx is automatically created/updated:
# Components
[//]: # 'This file is autogenerated, but the following list can be modified.'
- [Button](#button) - [Full Docs](./button/page.mdx)
- [Checkbox](#checkbox) [New] - [Full Docs](./checkbox/page.mdx)
[//]: # 'This file is autogenerated, DO NOT EDIT AFTER THIS LINE'
## Button
A versatile button component
- Keywords: button, click, action
- Sections:
- Installation
- Usage
- Basic Usage
- Advanced Usage
[Read more](./button/page.mdx)
## Checkbox
Toggle selection states
- Keywords: checkbox, selection, form
- Sections:
- Props
- Examples
[Read more](./checkbox/page.mdx)
## Input
Text input component.
- Keywords: input, form, text
- Sections:
- Variants
[Read more](./input/page.mdx)
[//]: # 'This file is autogenerated, but the following metadata can be modified.'
export const metadata = {
robots: {
index: false,
},
}
When indexWrapperComponent is configured, the autogenerated content is wrapped in the specified component:
# Components
[//]: # 'This file is autogenerated, but the following list can be modified.'
<PagesIndex>
- [Button](#button) - [Full Docs](./button/page.mdx)
- [Checkbox](#checkbox) [New] - [Full Docs](./checkbox/page.mdx)
[//]: # 'This file is autogenerated, DO NOT EDIT AFTER THIS LINE'
## Button
A versatile button component
[Read more](./button/page.mdx)
## Checkbox
Toggle selection states
[Read more](./checkbox/page.mdx)
</PagesIndex>
[//]: # 'This file is autogenerated, but the following metadata can be modified.'
export const metadata = {
robots: {
index: false,
},
}
This allows you to apply consistent styling or behavior (like custom navigation, search indexing, or layout) to the index content by defining a PagesIndex component in your MDX components.
[New], [Hot], or [Beta] directly after component namesYou can add status tags to index entries to highlight new, experimental, or noteworthy components. Tags appear directly after the component name (or link for external entries) for better visibility:
Regular entry format:
- [ComponentName](#slug) [Tag1] [Tag2] - [Full Docs](./path/page.mdx)
External/single-link entry format:
- [LinkTitle](./path) [Tag1] [Tag2]
Common tags:
[New] - Recently added components (automatically added for new entries)[Hot] - Trending or popular components[Beta] - Experimental or unstable features[External] - External links or resourcesExample:
# Handbook
[//]: # 'This file is autogenerated, but the following list can be modified.'
- [Forms](#forms) - [Full Docs](./forms/page.mdx)
- [TypeScript](#typescript) [New] - [Full Docs](./typescript/page.mdx)
- [llms.txt](/llms.txt) [External]
- [API Reference](/api) [External] [New]
[//]: # 'This file is autogenerated, DO NOT EDIT AFTER THIS LINE'
Tag behavior:
[New] tag[New] tag when the component is no longer new[TagName] with alphanumeric characters onlyThis feature helps users quickly identify new or notable components when browsing your documentation index.
To have pages automatically sorted alphabetically in your index, replace the default editable marker with the alphabetical sorting marker:
# Components
[//]: # 'This file is autogenerated, but the following list can be modified. Automatically sorted alphabetically.'
- [Alpha](./alpha/page.mdx) - First component
- [Beta](./beta/page.mdx) - Second component
- [Zebra](./zebra/page.mdx) - Last component
[//]: # 'This file is autogenerated, DO NOT EDIT AFTER THIS LINE'
Sorting behavior:
'This file is autogenerated, but the following list can be modified.' preserves the order you define in the editable section'This file is autogenerated, but the following list can be modified. Automatically sorted alphabetically.' sorts all pages alphabetically by title, ignoring the editable section orderlocaleCompare() for natural alphabetical orderingThis is useful for index pages where alphabetical order makes more sense than manual ordering, such as component libraries or API references.
A key benefit of auto-generated index pages is improved navigation UX. When users remove segments from the URL path (a common power-user pattern), they land on a meaningful index page instead of a 404:
/components/checkbox/page.mdx → User removes "checkbox"
/components/page.mdx → Lands on components index (not 404)
This creates a natural hierarchy where every directory level has content. Index pages don't need to be linked from your home page or site navigation—they can even be marked with noindex for SEO if you prefer they don't appear in search results. They exist purely to provide a web-native browsing experience for users exploring your documentation structure.
Example metadata for an unlisted index:
# Components
<!-- Auto-generated content -->
export const metadata = {
robots: { index: false },
};
;
Let the plugin generate index pages automatically throughout your documentation:
app/docs/
├── page.mdx # Auto-generated
├── getting-started/
│ └── page.mdx
├── components/
│ ├── page.mdx # Auto-generated
│ ├── button/
│ │ └── page.mdx
│ └── input/
│ └── page.mdx
└── api/
├── page.mdx # Auto-generated
└── reference/
└── page.mdx
Works alongside transformMarkdownCode to enhance documentation:
// next.config.js
import transformMarkdownMetadata from '@mui/internal-docs-infra/pipeline/transformMarkdownMetadata';
import transformMarkdownCode from '@mui/internal-docs-infra/pipeline/transformMarkdownCode';
const withMDX = require('@next/mdx')({
options: {
remarkPlugins: [[transformMarkdownMetadata, { extractToIndex: true }], transformMarkdownCode],
},
});
Typical plugin order for comprehensive docs processing:
const remarkPlugins = [
remarkGfm, // GitHub Flavored Markdown
[transformMarkdownMetadata, { extractToIndex: true }], // Extract metadata & build indexes
transformMarkdownCode, // Transform code blocks
transformMarkdownDemoLinks, // Handle demo links
transformMarkdownBlockquoteCallouts, // Style callouts
];
The plugin builds hierarchical section trees from your heading structure:
Input MDX:
# API Reference
Complete API documentation for the component.
## Props
Configure the component with these props.
### Required Props
Props that must be provided.
### Optional Props
Props with default values.
## Methods
Public methods available on the component.
Extracted Sections:
{
"props": {
"title": "Props",
"titleMarkdown": [{ "type": "text", "value": "Props" }],
"children": {
"required-props": {
"title": "Required Props",
"titleMarkdown": [{ "type": "text", "value": "Required Props" }],
"children": {}
},
"optional-props": {
"title": "Optional Props",
"titleMarkdown": [{ "type": "text", "value": "Optional Props" }],
"children": {}
}
}
},
"methods": {
"title": "Methods",
"titleMarkdown": [{ "type": "text", "value": "Methods" }],
"children": {}
}
}
The plugin preserves inline code, bold, and italic formatting in section titles:
Input MDX:
# Utilities
## `parseSource()`
Parse source code into AST nodes.
## **Performance** Optimization
Tips for improving performance.
## _Advanced_ Topics
Deep dive into advanced features.
Extracted Sections:
{
"parsesource": {
"title": "parseSource()",
"titleMarkdown": [{ "type": "inlineCode", "value": "parseSource()" }],
"children": {}
},
"performance-optimization": {
"title": "Performance Optimization",
"titleMarkdown": [
{ "type": "strong", "children": [{ "type": "text", "value": "Performance" }] },
{ "type": "text", "value": " Optimization" }
],
"children": {}
},
"advanced-topics": {
"title": "Advanced Topics",
"titleMarkdown": [
{ "type": "emphasis", "children": [{ "type": "text", "value": "Advanced" }] },
{ "type": "text", "value": " Topics" }
],
"children": {}
}
}
Example with all available fields:
Input MDX:
# Custom Title
Custom description text.
Page content here.
export const metadata = {
keywords: ['react', 'components', 'ui'],
};
;
Extracted Metadata:
{
"title": "Custom Title",
"description": "Custom description text.",
"keywords": ["react", "components", "ui"],
"sections": {
/* ... */
}
}
The plugin extracts and generates metadata in the following structure:
interface ExtractedMetadata {
title?: string;
description?: string;
descriptionMarkdown?: PhrasingContent[]; // Markdown AST nodes preserving formatting
keywords?: string[];
sections?: HeadingHierarchy;
embeddings?: number[];
image?: {
url: string;
alt?: string;
};
}
type HeadingHierarchy = {
[slug: string]: {
title: string; // Plain text for display
titleMarkdown: PhrasingContent[]; // Markdown AST nodes preserving formatting
children: HeadingHierarchy;
};
};
Similar to section titles, the plugin preserves both plain text and formatted markdown for descriptions:
description): Used for meta tags, search indexing, and SEOdescriptionMarkdown): Preserves original formatting like inline code, bold, italics, and linksThis allows descriptions like "Use transformMarkdownMetadata to extract metadata" to render with proper formatting while still having clean text available for search engines and social media previews.
export const metadata = { title: 'Custom' }# Page Title<meta name="description" content="..." /> or <Meta name="description" content="..." /> (anywhere in document)export const metadata = { description: '...' }undefined if none foundNote: While inline meta tags have the highest priority when present, using
export const metadataat the end of the file is preferred for better readability.
<meta name="keywords" content="keyword1, keyword2, keyword3" /> (anywhere in document)export const metadata = { keywords: ['...'] }undefined if none foundThe plugin supports both <meta> and <Meta> tags anywhere in the document:
Supported meta tags:
<meta name="description" content="..." /> - Page description for SEO<meta name="keywords" content="keyword1, keyword2, keyword3" /> - Comma-separated keywordsFeatures:
<meta> and <Meta> workExample:
# Component Name
## Section One
<meta name="description" content="Custom SEO description" />
Content here...
## Section Two
<meta name="keywords" content="react, component, ui, accessibility" />
More content...
Note: While this feature exists for flexibility and migration scenarios,
export const metadataat the end of the file is preferred for cleaner, more maintainable documentation.
The plugin maintains formatting in section titles through dual storage:
title): Used for display, slugs, and searchtitleMarkdown): Preserves original formatting for renderingThis allows rendering with backticks, bold, italics while still having clean text for URLs and indexing.
The plugin handles errors gracefully:
undefinedThe plugin stores both plain text and AST nodes for section titles:
The plugin's incremental update strategy is particularly valuable in Next.js:
createSitemap - Uses extracted metadata to build searchable sitemapssyncPageIndex - Updates parent directory indexesdocs-infra validate - CLI command that validates index filestransformMarkdownCode - Transform code blocks in documentationwithDocsInfra - Next.js plugin with all docs features