njsparser/js/README.md
2026-02-15 02:10:29 +01:00

10 KiB
Raw Permalink Blame History

NJSParser (JS)

A 99.9% AI-generated (including unit tests & README) JS port of novitae/njsparser for extracting and parsing Next.js hydration data from HTML content.

  • Parses flight data (from the self.__next_f.push scripts)
  • Parses next data from __NEXT_DATA__ script
  • Parses build manifests
  • Searches for build id
  • Many other things...

This is the JavaScript/TypeScript port of the Python njsparser library, designed to run on Bun.

Installation

bun add njsparser

Or install from source:

git clone <repository-url>
cd njsparser/js
bun install

DOMParser dependency

This library doesnt bundle an HTML parser. You must pass a DOMParser implementation when creating the parser:

  • In Bun tests we use deno-doms WASM build.
  • In a browser you can pass the native DOMParser.

Usage

Basic Setup (Bun)

import { DOMParser } from 'deno-dom/deno-dom-wasm.ts';
import { NJSParser } from 'njsparser';

const parser = NJSParser({ DOMParser });

Basic Setup (Browser)

import { NJSParser } from 'njsparser';

const parser = NJSParser({ DOMParser: window.DOMParser });

Parsing Flight Data

Flight data is contained in self.__next_f.push() scripts within NextJS pages.

import { DOMParser } from 'deno-dom/deno-dom-wasm.ts';
import { NJSParser } from 'njsparser';

const parser = NJSParser({ DOMParser });

const response = await fetch('https://nextjs.org');
const html = await response.text();

const flightData = parser.parser.getFlightData(html);

// Use BeautifulFD for easier searching
const fd = new parser.BeautifulFD(html);

// Find specific element types
for (const data of fd.find_iter([parser.types.Data])) {
  if (data.content && typeof data.content === 'object' && 'user' in data.content) {
    console.log(data.content.user);
    break;
  }
}

Parsing __NEXT_DATA__

const parser = NJSParser({ DOMParser });

const html = await fetch('https://example.com').then(r => r.text());
const nextData = parser.parser.getNextData(html);

if (nextData) {
  console.log('Build ID:', nextData.buildId);
  console.log('Page Props:', nextData.props);
}

Finding Build ID

const parser = NJSParser({ DOMParser });

const html = await fetch('https://example.com').then(r => r.text());
const buildId = parser.findBuildId(html);

console.log('Build ID:', buildId);

Parsing Build Manifests

const parser = NJSParser({ DOMParser });

const manifestScript = await fetch('https://example.com/_next/static/BUILD_ID/_buildManifest.js')
  .then(r => r.text());

const manifest = parser.parser.parseBuildManifest(manifestScript);
console.log('Manifest:', manifest);

API Reference

Factory Function

NJSParser({ DOMParser })

Creates a new parser instance.

Parameters:

  • DOMParser (required): a DOMParser class/constructor to parse HTML

Returns: Parser instance with all methods and classes

Parser Methods

parser.hasFlightData(html)

Check if HTML contains flight data.

Parameters:

  • html (string): HTML content

Returns: boolean

parser.getFlightData(html)

Extract and parse flight data from HTML.

Parameters:

  • html (string): HTML content

Returns: Object | null - Parsed flight data or null

parser.hasNextData(html)

Check if HTML contains __NEXT_DATA__ script.

Parameters:

  • html (string): HTML content

Returns: boolean

parser.getNextData(html)

Extract and parse __NEXT_DATA__ script.

Parameters:

  • html (string): HTML content

Returns: Object | null - Parsed JSON data or null

parser.parseBuildManifest(script)

Parse build manifest script.

Parameters:

  • script (string): Build manifest script content

Returns: Object - Parsed manifest

parser.getBuildManifestPath(buildId, basePath)

Generate build manifest path.

Parameters:

  • buildId (string): Build ID
  • basePath (string, optional): Base path

Returns: string - Build manifest path

parser.getNextStaticUrls(html)

Find all NextJS static URLs in HTML.

Parameters:

  • html (string): HTML content

Returns: Array<string> | null - Array of URLs or null

parser.getBasePath(htmlOrUrls, removeDomain)

Extract base path from URLs or HTML.

Parameters:

  • htmlOrUrls (string | Array): HTML content or array of URLs
  • removeDomain (boolean, optional): Remove domain from absolute URLs

Returns: string | null - Base path or null

High-Level Tools

hasNextJS(html)

Check if page has any NextJS data.

Parameters:

  • html (string): HTML content

Returns: boolean

findBuildId(html)

Find build ID from page (searches static URLs, next data, and flight data).

Parameters:

  • html (string): HTML content

Returns: string | null - Build ID or null

findInFlightData(flightData, classFilters, callback, recursive)

Find first matching element in flight data.

Parameters:

  • flightData (Object): Flight data dictionary
  • classFilters (Array, optional): Array of Element classes to filter by
  • callback (Function, optional): Callback function for filtering
  • recursive (boolean, optional): Search recursively (default: true)

Returns: Element | null

findallInFlightData(flightData, classFilters, callback, recursive)

Find all matching elements in flight data.

Parameters:

  • flightData (Object): Flight data dictionary
  • classFilters (Array, optional): Array of Element classes to filter by
  • callback (Function, optional): Callback function for filtering
  • recursive (boolean, optional): Search recursively (default: true)

Returns: Array<Element>

finditerInFlightData(flightData, classFilters, callback, recursive)

Iterator for finding elements in flight data.

Parameters:

  • flightData (Object): Flight data dictionary
  • classFilters (Array, optional): Array of Element classes to filter by
  • callback (Function, optional): Callback function for filtering
  • recursive (boolean, optional): Search recursively (default: true)

Returns: Generator<Element>

BeautifulFD Class

Simplified interface for working with flight data.

new BeautifulFD(htmlOrFlightData)

Create BeautifulFD instance.

Parameters:

  • htmlOrFlightData (string | Object): HTML content or parsed flight data

find(classFilters, callback, recursive)

Find first matching element.

Parameters:

  • classFilters (Array, optional): Array of Element classes
  • callback (Function, optional): Callback function for filtering
  • recursive (boolean, optional): Search recursively (default: true)

Returns: Element | null

find_all(classFilters, callback, recursive)

Find all matching elements.

Parameters:

  • classFilters (Array, optional): Array of Element classes
  • callback (Function, optional): Callback function for filtering
  • recursive (boolean, optional): Search recursively (default: true)

Returns: Array<Element>

find_iter(classFilters, callback, recursive)

Iterator for finding elements.

Parameters:

  • classFilters (Array, optional): Array of Element classes
  • callback (Function, optional): Callback function for filtering
  • recursive (boolean, optional): Search recursively (default: true)

Returns: Generator<Element>

as_list()

Get flight data as array.

Returns: Array<Element>

static from_list(list, viaEnumerate)

Create BeautifulFD from array.

Parameters:

  • list (Array): Array of Elements
  • viaEnumerate (boolean, optional): Use array indices as element indices

Returns: BeautifulFD

Element Types

All flight data elements extend the base Element class:

  • Element - Base class
  • HintPreload - Preload hints (class "HL")
  • Module - Module imports (class "I")
  • Text - Text content (class "T")
  • Data - Structured data
  • EmptyData - Null/empty data
  • SpecialData - Special markers (strings starting with "$")
  • HTMLElement - HTML elements
  • DataContainer - Container of multiple elements
  • DataParent - Parent element with children
  • URLQuery - URL query parameters
  • RSCPayload - React Server Components payload
  • Error - Error elements (class "E")

You can access types via parser.types:

const parser = NJSParser({ DOMParser });
const fd = new parser.BeautifulFD(html);

const textElements = fd.find_all([parser.types.Text]);
const modules = fd.find_all([parser.types.Module]);

API Utilities

api.getApiPath(buildId, basePath, path)

Generate API path for a page.

Parameters:

  • buildId (string): Build ID
  • basePath (string, optional): Base path
  • path (string, optional): Page path

Returns: string | null - API path or null for excluded paths

api.getIndexApiPath(buildId, basePath)

Generate index API path.

Parameters:

  • buildId (string): Build ID
  • basePath (string, optional): Base path

Returns: string

api.isApiExposedFromResponse(statusCode, contentType, text)

Check if API is exposed from response.

Parameters:

  • statusCode (number): HTTP status code
  • contentType (string): Content-Type header
  • text (string): Response text

Returns: boolean

api.listApiPaths(sortedPages, buildId, basePath, isApiExposed)

List API paths from build manifest.

Parameters:

  • sortedPages (Array): Sorted pages from build manifest
  • buildId (string): Build ID
  • basePath (string): Base path
  • isApiExposed (boolean, optional): Is API exposed

Returns: Array<string>

Differences from Python Version

  1. DOMParser Parameter: The JavaScript version requires a DOMParser instance to be provided, making it compatible with different environments (Bun, browser, Node.js with jsdom, etc.)

  2. No CLI: This port focuses on the library functionality only. No CLI is included.

  3. Factory Pattern: Uses a factory function to inject dependencies rather than global imports.

  4. Async/Await: JavaScript version is designed to work with async/await for fetching HTML.

  5. Native JSON: Uses native JSON.parse and JSON.stringify instead of orjson.

  6. Native eval: Uses JavaScript eval for build manifest parsing instead of pythonmonkey.

Testing

Run tests with Bun:

bun test

License

MIT

Credits

This is a JavaScript port of the Python njsparser library by novitae.