10 KiB
NJSParser (JS)
A 99.9% AI-generated (including unit tests & README) JS port of novitae/njsparser for extracting and parsing Next.js hydration data from HTML content.
- Parses flight data (from the
self.__next_f.pushscripts) - Parses next data from
__NEXT_DATA__script - Parses build manifests
- Searches for build id
- Many other things...
This is the JavaScript/TypeScript port of the Python njsparser library, designed to run on Bun.
Installation
bun add njsparser
Or install from source:
git clone <repository-url>
cd njsparser/js
bun install
DOMParser dependency
This library doesn’t bundle an HTML parser. You must pass a DOMParser implementation when creating the parser:
- In Bun tests we use
deno-dom’s WASM build. - In a browser you can pass the native
DOMParser.
Usage
Basic Setup (Bun)
import { DOMParser } from 'deno-dom/deno-dom-wasm.ts';
import { NJSParser } from 'njsparser';
const parser = NJSParser({ DOMParser });
Basic Setup (Browser)
import { NJSParser } from 'njsparser';
const parser = NJSParser({ DOMParser: window.DOMParser });
Parsing Flight Data
Flight data is contained in self.__next_f.push() scripts within NextJS pages.
import { DOMParser } from 'deno-dom/deno-dom-wasm.ts';
import { NJSParser } from 'njsparser';
const parser = NJSParser({ DOMParser });
const response = await fetch('https://nextjs.org');
const html = await response.text();
const flightData = parser.parser.getFlightData(html);
// Use BeautifulFD for easier searching
const fd = new parser.BeautifulFD(html);
// Find specific element types
for (const data of fd.find_iter([parser.types.Data])) {
if (data.content && typeof data.content === 'object' && 'user' in data.content) {
console.log(data.content.user);
break;
}
}
Parsing __NEXT_DATA__
const parser = NJSParser({ DOMParser });
const html = await fetch('https://example.com').then(r => r.text());
const nextData = parser.parser.getNextData(html);
if (nextData) {
console.log('Build ID:', nextData.buildId);
console.log('Page Props:', nextData.props);
}
Finding Build ID
const parser = NJSParser({ DOMParser });
const html = await fetch('https://example.com').then(r => r.text());
const buildId = parser.findBuildId(html);
console.log('Build ID:', buildId);
Parsing Build Manifests
const parser = NJSParser({ DOMParser });
const manifestScript = await fetch('https://example.com/_next/static/BUILD_ID/_buildManifest.js')
.then(r => r.text());
const manifest = parser.parser.parseBuildManifest(manifestScript);
console.log('Manifest:', manifest);
API Reference
Factory Function
NJSParser({ DOMParser })
Creates a new parser instance.
Parameters:
DOMParser(required): a DOMParser class/constructor to parse HTML
Returns: Parser instance with all methods and classes
Parser Methods
parser.hasFlightData(html)
Check if HTML contains flight data.
Parameters:
html(string): HTML content
Returns: boolean
parser.getFlightData(html)
Extract and parse flight data from HTML.
Parameters:
html(string): HTML content
Returns: Object | null - Parsed flight data or null
parser.hasNextData(html)
Check if HTML contains __NEXT_DATA__ script.
Parameters:
html(string): HTML content
Returns: boolean
parser.getNextData(html)
Extract and parse __NEXT_DATA__ script.
Parameters:
html(string): HTML content
Returns: Object | null - Parsed JSON data or null
parser.parseBuildManifest(script)
Parse build manifest script.
Parameters:
script(string): Build manifest script content
Returns: Object - Parsed manifest
parser.getBuildManifestPath(buildId, basePath)
Generate build manifest path.
Parameters:
buildId(string): Build IDbasePath(string, optional): Base path
Returns: string - Build manifest path
parser.getNextStaticUrls(html)
Find all NextJS static URLs in HTML.
Parameters:
html(string): HTML content
Returns: Array<string> | null - Array of URLs or null
parser.getBasePath(htmlOrUrls, removeDomain)
Extract base path from URLs or HTML.
Parameters:
htmlOrUrls(string | Array): HTML content or array of URLsremoveDomain(boolean, optional): Remove domain from absolute URLs
Returns: string | null - Base path or null
High-Level Tools
hasNextJS(html)
Check if page has any NextJS data.
Parameters:
html(string): HTML content
Returns: boolean
findBuildId(html)
Find build ID from page (searches static URLs, next data, and flight data).
Parameters:
html(string): HTML content
Returns: string | null - Build ID or null
findInFlightData(flightData, classFilters, callback, recursive)
Find first matching element in flight data.
Parameters:
flightData(Object): Flight data dictionaryclassFilters(Array, optional): Array of Element classes to filter bycallback(Function, optional): Callback function for filteringrecursive(boolean, optional): Search recursively (default: true)
Returns: Element | null
findallInFlightData(flightData, classFilters, callback, recursive)
Find all matching elements in flight data.
Parameters:
flightData(Object): Flight data dictionaryclassFilters(Array, optional): Array of Element classes to filter bycallback(Function, optional): Callback function for filteringrecursive(boolean, optional): Search recursively (default: true)
Returns: Array<Element>
finditerInFlightData(flightData, classFilters, callback, recursive)
Iterator for finding elements in flight data.
Parameters:
flightData(Object): Flight data dictionaryclassFilters(Array, optional): Array of Element classes to filter bycallback(Function, optional): Callback function for filteringrecursive(boolean, optional): Search recursively (default: true)
Returns: Generator<Element>
BeautifulFD Class
Simplified interface for working with flight data.
new BeautifulFD(htmlOrFlightData)
Create BeautifulFD instance.
Parameters:
htmlOrFlightData(string | Object): HTML content or parsed flight data
find(classFilters, callback, recursive)
Find first matching element.
Parameters:
classFilters(Array, optional): Array of Element classescallback(Function, optional): Callback function for filteringrecursive(boolean, optional): Search recursively (default: true)
Returns: Element | null
find_all(classFilters, callback, recursive)
Find all matching elements.
Parameters:
classFilters(Array, optional): Array of Element classescallback(Function, optional): Callback function for filteringrecursive(boolean, optional): Search recursively (default: true)
Returns: Array<Element>
find_iter(classFilters, callback, recursive)
Iterator for finding elements.
Parameters:
classFilters(Array, optional): Array of Element classescallback(Function, optional): Callback function for filteringrecursive(boolean, optional): Search recursively (default: true)
Returns: Generator<Element>
as_list()
Get flight data as array.
Returns: Array<Element>
static from_list(list, viaEnumerate)
Create BeautifulFD from array.
Parameters:
list(Array): Array of ElementsviaEnumerate(boolean, optional): Use array indices as element indices
Returns: BeautifulFD
Element Types
All flight data elements extend the base Element class:
Element- Base classHintPreload- Preload hints (class "HL")Module- Module imports (class "I")Text- Text content (class "T")Data- Structured dataEmptyData- Null/empty dataSpecialData- Special markers (strings starting with "$")HTMLElement- HTML elementsDataContainer- Container of multiple elementsDataParent- Parent element with childrenURLQuery- URL query parametersRSCPayload- React Server Components payloadError- Error elements (class "E")
You can access types via parser.types:
const parser = NJSParser({ DOMParser });
const fd = new parser.BeautifulFD(html);
const textElements = fd.find_all([parser.types.Text]);
const modules = fd.find_all([parser.types.Module]);
API Utilities
api.getApiPath(buildId, basePath, path)
Generate API path for a page.
Parameters:
buildId(string): Build IDbasePath(string, optional): Base pathpath(string, optional): Page path
Returns: string | null - API path or null for excluded paths
api.getIndexApiPath(buildId, basePath)
Generate index API path.
Parameters:
buildId(string): Build IDbasePath(string, optional): Base path
Returns: string
api.isApiExposedFromResponse(statusCode, contentType, text)
Check if API is exposed from response.
Parameters:
statusCode(number): HTTP status codecontentType(string): Content-Type headertext(string): Response text
Returns: boolean
api.listApiPaths(sortedPages, buildId, basePath, isApiExposed)
List API paths from build manifest.
Parameters:
sortedPages(Array): Sorted pages from build manifestbuildId(string): Build IDbasePath(string): Base pathisApiExposed(boolean, optional): Is API exposed
Returns: Array<string>
Differences from Python Version
-
DOMParser Parameter: The JavaScript version requires a DOMParser instance to be provided, making it compatible with different environments (Bun, browser, Node.js with jsdom, etc.)
-
No CLI: This port focuses on the library functionality only. No CLI is included.
-
Factory Pattern: Uses a factory function to inject dependencies rather than global imports.
-
Async/Await: JavaScript version is designed to work with async/await for fetching HTML.
-
Native JSON: Uses native
JSON.parseandJSON.stringifyinstead of orjson. -
Native eval: Uses JavaScript
evalfor build manifest parsing instead of pythonmonkey.
Testing
Run tests with Bun:
bun test
License
MIT
Credits
This is a JavaScript port of the Python njsparser library by novitae.