You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
KaKi87 75399f9340 ✏️ Typo 1 week ago
.gitignore Initial commit 1 year ago
README.md ✏️ Typo 1 week ago
index.js ✏️ Typo 1 week ago
package.json Upgrade to v1.0.11 1 month ago
yarn.lock Remove jsdom NPM dependency as per 4f8a414 1 month ago

README.md

deepl-scraper

Scrape data from DeepL translator without applying for the paid authenticated API, using Puppeteer, Chrome browser’s headless API.

Getting started

Prerequisites

  • NodeJS
  • NPM
  • Yarn

Install

From npm

yarn add deepl-scraper

or

npm i deepl-scraper --save

Additional steps for puppeteer on Linux

sudo apt install libnss3-dev libxss1
sudo sysctl -w kernel.unprivileged_userns_clone=1

Usage

const { translate, getSupportedLanguages, quit } = require('deepl-scraper');

translate(sentence, source, target).then(console.log).catch(console.error);

Parameters

  • sentence string - Word/sentence to be translated
  • source string (optional) - Word/sentence original language
    Default : auto
  • target string - Language for word/sentence to be translated

Supported languages

The module doesn’t store languages and will always support DeepL’s languages list without required update.

IETF tags are used as language format.

Get arrays of supported source and target languages using the following code :

getSupportedLanguages().then(console.log);
{
    "sourceLanguages": [
        "en",
        "de",
        "fr",
        "es",
        "pt",
        "pt",
        "it",
        "nl",
        "pl",
        "ru",
        "ja",
        "zh"
    ],
    "targetLanguages": [
        "en-US",
        "en-GB",
        "de-DE",
        "fr-FR",
        "es-ES",
        "pt-BR",
        "it-IT",
        "nl-NL",
        "pl-PL",
        "ru-RU",
        "ja-JA",
        "zh-ZH"
    ]
}

Quitting

Since this module uses a headless browser, it won’t quit as long as your main script is ronning or until you quit it using the following code :

quit()

Error handling

  • INVALID_SOURCE_LANGUAGE
  • INVALID_TARGET_LANGUAGE
  • UNSUPPORTED_SOURCE_LANGUAGE
  • UNSUPPORTED_TARGET_LANGUAGE

Examples

Translate from defined language

translate('hello', 'en', 'fr-FR').then(console.log);
{
    "source": {
        "lang": "en",
        "sentence": "hello"
    },
    "target": {
        "lang": "fr",
        "sentences": [
            "Bonjour,",
            "Bonjour",
            "bonjour",
            "hello",
            "h"
        ],
        "translation": "hello"
    }
}

Translate from auto-detected language

(a single word’s language is usually more difficult to detect)

// Either
translate('hello', 'auto', 'fr-FR').then(console.log);
// Or
translate('hello', undefined, 'fr-FR').then(console.log);
{
    "source": {
        "lang": "en",
        "confident": false,
        "sentence": "hello"
    },
    "target": {
        "lang": "fr",
        "sentences": [
            "Bonjour,",
            "Bonjour",
            "bonjour",
            "hello",
            "h"
        ],
        "translation": "hello"
    }
}

(a sentence’s language is usually more efficient)

translate('hey, what\'s up ?', 'auto', 'fr-FR').then(console.log);
{
    "source": {
        "lang": "en",
        "confident": false,
        "sentence": "hey, what's up ?"
    },
    "target": {
        "lang": "fr",
        "sentences": [
            "hey, quoi de neuf ?",
            "Hé, qu'est-ce qu'il y a ?",
            "h",
            "Hé, quoi de neuf ?",
            "Hé, qu'est-ce qu'il y a ?"
        ],
        "translation": "Hé, qu'est-ce qu'il y a ?"
    }
}

Planned features

  • Get word-by-word definitions, quotes and synonyms
  • Bypass maximum text length

FAQ

Why using Puppeteer instead of HTTP requests ?
Fucking rate limit error always showing up, whatever you do.

Changelog

  • 1.0.0 (2019-05-06) • Initial release
  • 1.0.x (2019-05-11)
    1.0.1
    • Added examples title in README.md
    • Fixed Promise never resolving while alt sentences not available
      1.0.2
    • Retry when browser error
      1.0.3
    • Fixed retry
    • Fixed changelog version typo
      1.0.4
    • Close page after translation
  • 1.0.5 (2019-06-27)
    • Added getSupportedLanguages method
    • Improved translation complete detection
    • Fixed cannot launch new browser when previously closed
    • Fixed incompatibility with multi-requests translation (#1)
  • 1.0.6 (2019-07-05)
    • Fixed auto/non-auto parameter
    • Fixed translation complete detection (again)
  • 1.0.7 (2019-07-05)
    • Fixed translation complete detection (again !)
  • 1.0.9 (2020-07-10)
    • Fixed supported languages detection & selection
    • Improved translation completion detection & result parsing
    • Replaced deprecated request dependency by axios dependency
    • Replaced JSDOM dependency by RegExp processing
    • Replaced quit parameter from translate method to new quit method
    • Refactored using async/await
  • 1.0.10 (2020-08-21) • Actually remove jsdom NPM dependency
  • 1.0.11 (2020-08-24)
    • Fixed & improved supported languages parsing
    • Implemented UNSUPPORTED_SOURCE_LANGUAGE & UNSUPPORTED_TARGET_LANGUAGE errors