In this article I dive into Remark and the Unified collective. We will learn :
At the end of the article you will know enough to develop your own unified plugins.
Unified is described as:
an interface for parsing, inspecting, transforming, and serializing content through syntax trees
Basically, unified provides a specification for parsing and transforming content. Processors are being built around this specification to process markdown, html, etc. The idea is that unified
itself doesn't do much except chain the processors. Each processor has an ecosystem around it and we count four main ecosystems:
Now if we would like to use unified
and its processors to transform markdown to html our code would look like this:
var unified = require('unified')
var stream = require('unified-stream')
var markdown = require('remark-parse')
var remark2rehype = require('remark-rehype')
var html = require('rehype-stringify')
var processor = unified()
.use(markdown)
.use(remark2rehype)
.use(html)
process.stdin.pipe(stream(processor)).pipe(process.stdout)
If we visit the main page of the unifiedjs site we are welcomed with the following message:
Content as structured data We compile content to syntax trees and syntax trees to content.
In our case the syntax tree is a tree representation of the source code. Syntax trees are also called abstract syntax trees, abbreviated as AST.
There is a great website astexplorer.net that shows us the syntax tree of a given source code for different languages. Here is an example of a syntax tree:
This syntax tree represents the following markdown code:
# Hello
This is some simple text with some *emphasis* and `code`.
Each element of the tree (root, heading, text, etc.) is a node.
Unified uses unist as a specification for syntax trees. Other specifications are implementing unist. There is mdast: a specification for representing Markdown in a syntax tree, hast for HTML and xast for XML.
A lot of utilities have been created around unist. One of the most popular utility is unist-util-visit which facilitates visiting nodes in the syntax tree. But there are also utilities for specifications implementing unist. For instance mdast-util-toc can generate a Table of Contents from the markdown AST.
We can use unist-util-visit to perform a tree traversal: visit a node and all its children. For instance the following code logs every node type encountered:
var visit = require('unist-util-visit')
// …
visit(tree, function(node) {
console.log(node.type)
})
The output would look like this:
root
heading
text
paragraph
text
...
We can also decide which nodes to visit:
visit(tree, ['text', 'inlineCode'], function(node) {
console.log(node.tagName)
})
remark
is a Markdown processor part of the unified collective. The power of remark resides in the fact that once the markdown has been parsed, it is transformed to a syntax tree. We can then use plugins to do whatever we want to that syntax tree: check the code style, add or remove content, modify content, etc.
This is the process of a unified
processor:
In the case of Remark, the parser is remark-parse, the compiler is remark-stringify and the transformers are remark plugins. If we go back to our previous unified example code we can now have a better understanding of what is happening:
var unified = require('unified')
var stream = require('unified-stream')
var markdown = require('remark-parse')
var remark2rehype = require('remark-rehype')
var html = require('rehype-stringify')
var processor = unified()
// The markdown code is given by the processor then
// we use the remark-parser to get a syntax tree
.use(markdown)
// We use a transformer to transform the syntax tree
// from a markdown tree to a HTML tree
.use(remark2rehype)
// We use a compiler to compile the syntax tree
// to HTML
.use(html)
process.stdin.pipe(
stream(processor) // We give the input to the processor
).pipe(process.stdout)
Unified plugins can change the processor, whether it is its parser, its compiler or its data configuration. Plugins materialize as attachers which are functions that can receive options and optionnaly return a transformer.
function attacher([options])
A transformer is a function able to modify the AST. It takes a root node as a parameter and returns a new syntax tree. The next transformer is then given this new tree.
function transformer(node, file[, next])
A minimal plugin that simply logs the node encountered would be defined like this:
// visit is a utility to help us walk through the AST
const visit = require('unist-util-visit')
// We create our plugin by exporting the attacher function
module.exports = logNodes
// The attacher function called logNodes
function logNodes(options) {
return transformer
// The transformer takes the tree as an input
// and optionnally returns a modified tree
function transformer(tree) {
// Here we just want to log the nodes
// no need to modify the tree
visit(tree, (node) => console.log(node.type))
}
}
This example is taken verbatim from the unifiedjs handbook which is still a work in progress as of writing this article.
A more advanced plugin that prefixes "BREAKING" to all h1
nodes is defined like this:
const visit = require('unist-util-visit')
module.exports = prefixHeading
function prefixHeading(options) {
function transformer(tree) {
// We visit only 'heading' nodes
visit(tree, 'heading', node => {
// We visit only 'h1' headings
if (node.depth !== 1) {
return
}
visit(node, 'text', textNode => {
// We modify the AST to add our prefix
textNode.value = 'BREAKING ' + textNode.value
})
})
}
}
Here we modify the AST directly. This might go against the usual best practices you've been taught but as the tree grows in size, making a copy of it doesn't make sense and induces performance issues.
To apply our plugin, we chain the use()
method:
var unified = require('unified')
var stream = require('unified-stream')
var markdown = require('remark-parse')
var remark2rehype = require('remark-rehype')
var html = require('rehype-stringify')
// We import our plugin
var prefixBreaking = require('./prefixBreaking')
var processor = unified()
.use(markdown)
// We apply it when the parser has done its job
.use(prefixBreaking)
.use(remark2rehype)
.use(html)
process.stdin.pipe(
stream(processor)
).pipe(process.stdout)
Had we programmed our plugin to take options (to change the word prefixed for instance), we would have the opportunity to pass them when applying the plugin:
var processor = unified()
.use(markdown)
// We pass the options when applying the plugin
.use(prefixBreaking, { prefix: "Incredible! "})
.use(remark2rehype)
.use(html)
We learned about unified, remark and its plugins.
In this section we use all our knowledge to create a remark plugin.
We will see how to structure our code and create our tests.
The plugin itself will be the one shown earlier to prefix BREAKING to h1
headings.
The code for this plugin is available on my Github repository
We create a minimal project called starter-remark-plugin
:
git init starter-remark-plugin
cd starter-remark-plugin
npm init
We answer as we can to the npm questions, we will have to come back to the package.json
file anyway.
The entrypoint will be index.js
, we create it with the following code:
const visit = require('unist-util-visit')
module.exports = (options) => tree => {
visit(tree, 'heading', node => {
if (node.depth !== 1) {
return
}
visit(node, 'text', textNode => {
textNode.value = 'BREAKING ' + textNode.value
})
})
}
That's it for the plugin creation. Let's move on to the tests.
We will use Jest to perform our tests.
We also need remark
for our tests.
We install those with:
npm install --save-dev remark jest
We modify our package.json
to add our test script:
{
"name": "starter-remark-plugin",
"version": "1.0.0",
"description": "A minimal example of a remark plugin",
"main": "index.js",
"scripts": {
"test": "jest"
},
"author": "Braincoke",
"license": "MIT",
"devDependencies": {
"jest": "^25.1.0",
"remark": "^11.0.2"
}
}
We create a directory where our tests will live:
mkdir tests
In this directory we create our first test file named markdown.test.js
.
// Import remark to parse markdown
const remark = require('remark')
// Import our plugin to add prefix to h1
const plugin = require('..')
test('adds BREAKING to h1 headings', () => {
const inputString = [
'# New virus reaches Europe',
'',
'## Origin',
'',
'There is no known origin as of today',
].join('\n')
// We have to add a newline at the end
const expectedString = [
'# BREAKING New virus reaches Europe',
'',
'## Origin',
'',
'There is no known origin as of today',
''
].join('\n')
// Create our processor with our plugin
const processor = remark()
.use(plugin)
const resultString = processor.processSync(inputString).toString()
expect(resultString).toEqual(expectedString)
})
We run our tests with the test script previously created:
npm run test
> starter-remark-plugin@1.0.0 test /home/sammy/Projects/starter-remark-plugin
> jest
PASS tests/markdown.test.js
✓ adds BREAKING to h1 headings (9ms)
Test Suites: 1 passed, 1 total
Tests: 1 passed, 1 total
Snapshots: 0 total
Time: 0.796s, estimated 1s
Ran all test suites.
We added the test data directly in our JS file.
We can also use test fixtures to describe inputs and their expected outputs.
We create a folder ./tests/fixtures
and two files before.md
and after.md
.
# Unknown new virus reachs Europe
## Origins unknown
This new virus of unknown origin just reached Europe.
Here is the expected output.
# BREAKING Unknown new virus reachs Europe
## Origins unknown
This new virus of unknown origin just reached Europe.
We also add a new test to our test suite:
// import fs and path
const fs = require('fs')
const path = require('path')
test('adds BREAKING to h1 in fixtures', () => {
const before = fs.readFileSync(path.resolve(__dirname,'fixtures/before.md'), 'utf8')
const after = fs.readFileSync(path.resolve(__dirname,'fixtures/after.md'), 'utf8')
const result = remark().use(plugin).processSync(before)
expect(result.contents).toEqual(after)
})
Now when we run npm run test
this new test will also be executed.
UnifiedJS and its processors remark, rehype, retext are great tools whenever you need to parse a document and modify it. Creating plugins is easy given the proper documentation. There are major tools around that rely on unified such as NodeJS, Gatsby or new tools like Gridsome. So next time you need to parse and modify a document, think of the Unified Collective !