overleaf/services/web/frontend/js/features/source-editor/lezer-latex
Eric Mc Sween f0c67b45c7 Merge pull request #19137 from overleaf/em-mj-parbox
Add special parsing for \parbox

GitOrigin-RevId: a69a519fa75af95712e19426265224507400f105
2024-07-03 08:04:48 +00:00
..
latex.grammar Merge pull request #19137 from overleaf/em-mj-parbox 2024-07-03 08:04:48 +00:00
README.md
tokens.mjs Merge pull request #19137 from overleaf/em-mj-parbox 2024-07-03 08:04:48 +00:00

Lezer-LaTeX, a LaTeX Parser

Lezer-LaTeX is a LaTeX parser implemented with lezer, the parser system used by CodeMirror 6.

The parser is written in a "grammar" file, (and a "tokens" file with custom tokenizer logic) which is then compiled by @lezer/generator into a parser module and a "terms" module. The parser module is then loaded by the CodeMirror 6 in the web frontend codebase.

Important files

  • Source files:

    • ./latex.grammar: The grammar file, containing the specification for the parser
    • ./tokens.mjs: The custom tokenizer logic, required by some rules in the grammar
  • Generated files:

    • ./latex.mjs: The generated parser
    • ./latex.terms.mjs: The generated terms file
    • (these files are ignored by git, eslint, and prettier)
  • Scripts:

    • web/scripts/lezer-latex/generate.js: A script which runs the generator on the grammar, producing the generated parser/terms files
    • web/scripts/lezer-latex/run.mjs: A script that runs the parser against a supplied file, and prints the tree to the terminal
  • Webpack plugins:

    • web/webpack-plugins/lezer-grammar-compiler.js: A webpack plugin that calls the generator as part of the webpack build. In dev, it will automatically re-build the parser when the grammar file changes.

NPM tasks

  • lezer-latex:generate: Generate the parser files from the grammar

    • (Calls lezer-latex/generate.js)
    • This should be run whenever the grammar changes
  • lezer-latex:run: Run the parser against a file

    • (Calls lezer-latex/run.js)

Generating the parser

From the monorepo root:

bin/npm -w services/web run 'lezer-latex:generate'

Tests

Unit tests for the parser live in web/test/unit/src/LezerLatex. There are three kinds of test, in three subdirectories:

  • corpus/: A set of tests using lezer's test framework, consisting of example text and the expected parse tree
  • examples/: A set of realistic LaTeX documents. These tests pass if the files parse with no errors
  • regressions/: Like examples/, these are expected to parse without error, but they are not realistic documents.

These tests run as part of test_frontend. You can run these tests alone by invoking:

make test_unit MOCHA_GREP='lezer-latex'

Trying the parser

While developing the parser, you can run it against a file by calling the lezer-latex:run task. There are some example files in the test suite, at web/test/unit/src/LezerLatex/examples/.

For example:

bin/npm -w services/web run 'lezer-latex:run'  web/test/unit/src/LezerLatex/examples/amsmath.tex

If you omit the file path, the default file (examples/demo.tex) will be run.

Integration into web

The web frontend imports the parser (from latex.mjs), in frontend/js/features/source-editor/languages/latex/index.ts. The parser is then plugged in to the CM6 language system.

The web build

In web/Dockerfile, we have a RUN command that calls lezer-latex:generate as part of the build. This is necessary to ensure the parser is built before the CI tests run (notably: we can't do the build during the tests, because we can't write to disk during that stage of CI).