An Introduction to Unified and Remark

2020 March 15

In this article I dive into Remark and the Unified collective. We will learn :

  • What the Unified collective is
  • How Remark integrates in that collective
  • How unified works with syntax trees

At the end of the article you will know enough to develop your own unified plugins.

The Unified collective

Unified is described as:

an interface for parsing, inspecting, transforming, and serializing content through syntax trees

Basically, unified provides a specification for parsing and transforming content. Processors are being built around this specification to process markdown, html, etc. The idea is that unified itself doesn't do much except chain the processors. Each processor has an ecosystem around it and we count four main ecosystems:

  • remark: a Markdown processor
  • rehype: a HTML processor
  • retext: a Natural language processor
  • redot: a Graphviz processor

Now if we would like to use unified and its processors to transform markdown to html our code would look like this:

var unified = require('unified')
var stream = require('unified-stream')
var markdown = require('remark-parse')
var remark2rehype = require('remark-rehype')
var html = require('rehype-stringify')

var processor = unified()
  .use(markdown)
  .use(remark2rehype)
  .use(html)

process.stdin.pipe(stream(processor)).pipe(process.stdout)

Syntax trees

If we visit the main page of the unifiedjs site we are welcomed with the following message:

Content as structured data We compile content to syntax trees and syntax trees to content.

In our case the syntax tree is a tree representation of the source code. Syntax trees are also called abstract syntax trees, abbreviated as AST.

There is a great website astexplorer.net that shows us the syntax tree of a given source code for different languages. Here is an example of a syntax tree:

root
heading
text
paragraph
text
emphasis
text
inlineCode
text

This syntax tree represents the following markdown code:

# Hello

This is some simple text with some *emphasis* and `code`.

Each element of the tree (root, heading, text, etc.) is a node.

Unist

Unified uses unist as a specification for syntax trees. Other specifications are implementing unist. There is mdast: a specification for representing Markdown in a syntax tree, hast for HTML and xast for XML.

A lot of utilities have been created around unist. One of the most popular utility is unist-util-visit which facilitates visiting nodes in the syntax tree. But there are also utilities for specifications implementing unist. For instance mdast-util-toc can generate a Table of Contents from the markdown AST.

Visiting a tree

We can use unist-util-visit to perform a tree traversal: visit a node and all its children. For instance the following code logs every node type encountered:

var visit = require('unist-util-visit')

// …

visit(tree, function(node) {
  console.log(node.type)
})

The output would look like this:

root
heading
text
paragraph
text
...

We can also decide which nodes to visit:

visit(tree, ['text', 'inlineCode'], function(node) {
  console.log(node.tagName)
})

Remark

remark is a Markdown processor part of the unified collective. The power of remark resides in the fact that once the markdown has been parsed, it is transformed to a syntax tree. We can then use plugins to do whatever we want to that syntax tree: check the code style, add or remove content, modify content, etc.

This is the process of a unified processor:

Stringify
Run
Parse
Compiler
Output
Syntax Tree
Transformer
Parser
Input

In the case of Remark, the parser is remark-parse, the compiler is remark-stringify and the transformers are remark plugins. If we go back to our previous unified example code we can now have a better understanding of what is happening:

var unified = require('unified')
var stream = require('unified-stream')
var markdown = require('remark-parse')
var remark2rehype = require('remark-rehype')
var html = require('rehype-stringify')

var processor = unified()
  // The markdown code is given by the processor then
  // we use the remark-parser to get a syntax tree
  .use(markdown)
  // We use a transformer to transform the syntax tree
  // from a markdown tree to a HTML tree
  .use(remark2rehype)
  // We use a compiler to compile the syntax tree
  // to HTML
  .use(html)

process.stdin.pipe(
  stream(processor) // We give the input to the processor
).pipe(process.stdout)

Plugins

Unified plugins can change the processor, whether it is its parser, its compiler or its data configuration. Plugins materialize as attachers which are functions that can receive options and optionnaly return a transformer.

function attacher([options])

A transformer is a function able to modify the AST. It takes a root node as a parameter and returns a new syntax tree. The next transformer is then given this new tree.

function transformer(node, file[, next])

Minimal plugin example

A minimal plugin that simply logs the node encountered would be defined like this:

// visit is a utility to help us walk through the AST
const visit = require('unist-util-visit')

// We create our plugin by exporting the attacher function
module.exports = logNodes


// The attacher function called logNodes
function logNodes(options) {
  return transformer

  // The transformer takes the tree as an input
  // and optionnally returns a modified tree
  function transformer(tree) {
    // Here we just want to log the nodes
    // no need to modify the tree
    visit(tree, (node) => console.log(node.type))
  }
}

Modifying the syntax tree with the plugin

This example is taken verbatim from the unifiedjs handbook which is still a work in progress as of writing this article.

A more advanced plugin that prefixes "BREAKING" to all h1 nodes is defined like this:

prefixBreaking.js
const visit = require('unist-util-visit')

module.exports = prefixHeading

function prefixHeading(options) {

  function transformer(tree) {
    // We visit only 'heading' nodes
    visit(tree, 'heading', node => {
      // We visit only 'h1' headings
      if (node.depth !== 1) {
        return
      }

      visit(node, 'text', textNode => {
        // We modify the AST to add our prefix
        textNode.value = 'BREAKING ' + textNode.value
      })
    })
  }
}

Here we modify the AST directly. This might go against the usual best practices you've been taught but as the tree grows in size, making a copy of it doesn't make sense and induces performance issues.

Using the plugin in the processor

To apply our plugin, we chain the use() method:

var unified = require('unified')
var stream = require('unified-stream')
var markdown = require('remark-parse')
var remark2rehype = require('remark-rehype')
var html = require('rehype-stringify')
// We import our plugin
var prefixBreaking = require('./prefixBreaking')

var processor = unified()
  .use(markdown)
  // We apply it when the parser has done its job
  .use(prefixBreaking)
  .use(remark2rehype)
  .use(html)

process.stdin.pipe(
  stream(processor)
).pipe(process.stdout)

Passing options

Had we programmed our plugin to take options (to change the word prefixed for instance), we would have the opportunity to pass them when applying the plugin:

var processor = unified()
  .use(markdown)
  // We pass the options when applying the plugin
  .use(prefixBreaking, { prefix: "Incredible! "})
  .use(remark2rehype)
  .use(html)

Creating a remark plugin

We learned about unified, remark and its plugins. In this section we use all our knowledge to create a remark plugin. We will see how to structure our code and create our tests. The plugin itself will be the one shown earlier to prefix BREAKING to h1 headings.

The code for this plugin is available on my Github repository

Project creation

We create a minimal project called starter-remark-plugin:

git init starter-remark-plugin
cd starter-remark-plugin
npm init

We answer as we can to the npm questions, we will have to come back to the package.json file anyway. The entrypoint will be index.js, we create it with the following code:

index.js
const visit = require('unist-util-visit')

module.exports = (options) => tree => {
  visit(tree, 'heading', node => {
    if (node.depth !== 1) {
      return
    }

    visit(node, 'text', textNode => {
      textNode.value = 'BREAKING ' + textNode.value
    })
  })
}

That's it for the plugin creation. Let's move on to the tests.

Project tests

Setup

We will use Jest to perform our tests. We also need remark for our tests. We install those with:

npm install --save-dev remark jest

We modify our package.json to add our test script:

{
  "name": "starter-remark-plugin",
  "version": "1.0.0",
  "description": "A minimal example of a remark plugin",
  "main": "index.js",
  "scripts": {
    "test": "jest"
  },
  "author": "Braincoke",
  "license": "MIT",
  "devDependencies": {
    "jest": "^25.1.0",
    "remark": "^11.0.2"
  }
}

Test files

We create a directory where our tests will live:

mkdir tests

In this directory we create our first test file named markdown.test.js.

// Import remark to parse markdown
const remark = require('remark')
// Import our plugin to add prefix to h1
const plugin = require('..')

test('adds BREAKING to h1 headings', () => {
  const inputString = [
    '# New virus reaches Europe',
    '',
    '## Origin',
    '',
    'There is no known origin as of today',
  ].join('\n')

  // We have to add a newline at the end
  const expectedString = [
    '# BREAKING New virus reaches Europe',
    '',
    '## Origin',
    '',
    'There is no known origin as of today',
    ''
  ].join('\n')

  // Create our processor with our plugin
  const processor = remark()
  .use(plugin)

  const resultString = processor.processSync(inputString).toString()
  expect(resultString).toEqual(expectedString)
})

We run our tests with the test script previously created:

npm run test

> starter-remark-plugin@1.0.0 test /home/sammy/Projects/starter-remark-plugin
> jest

 PASS  tests/markdown.test.js
  ✓ adds BREAKING to h1 headings (9ms)

Test Suites: 1 passed, 1 total
Tests:       1 passed, 1 total
Snapshots:   0 total
Time:        0.796s, estimated 1s
Ran all test suites.

Using fixtures

We added the test data directly in our JS file. We can also use test fixtures to describe inputs and their expected outputs. We create a folder ./tests/fixtures and two files before.md and after.md.

before.md
# Unknown new virus reachs Europe

## Origins unknown

This new virus of unknown origin just reached Europe.

Here is the expected output.

after.md
# BREAKING Unknown new virus reachs Europe

## Origins unknown

This new virus of unknown origin just reached Europe.

We also add a new test to our test suite:

// import fs and path
const fs = require('fs')
const path = require('path')

test('adds BREAKING to h1 in fixtures', () => {
  const before = fs.readFileSync(path.resolve(__dirname,'fixtures/before.md'), 'utf8')
  const after = fs.readFileSync(path.resolve(__dirname,'fixtures/after.md'), 'utf8')

  const result = remark().use(plugin).processSync(before)
  expect(result.contents).toEqual(after)
})

Now when we run npm run test this new test will also be executed.

Conclusion

UnifiedJS and its processors remark, rehype, retext are great tools whenever you need to parse a document and modify it. Creating plugins is easy given the proper documentation. There are major tools around that rely on unified such as NodeJS, Gatsby or new tools like Gridsome. So next time you need to parse and modify a document, think of the Unified Collective !