You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
KaKi87 d4307b5737 Fix changelog 3 weeks ago
.gitignore Initial commit 1 year ago
README.md Fix changelog 3 weeks ago
index.js Fix page.close failing after quit 3 weeks ago
package.json Upgrade to v1.0.9 3 weeks ago
yarn.lock Use axios instead of request 3 weeks ago

README.md

deepl-scraper

Scrape data from DeepL translator without applying for the paid authenticated API, using Puppeteer, Chrome browser’s headless API.

Getting started

Prerequisites

  • NodeJS
  • NPM
  • Yarn

Install

From npm

yarn add deepl-scraper

or

npm i deepl-scraper --save

Usage

const { translate, getSupportedLanguages, quit } = require('deepl-scraper');

translate(sentence, source, target).then(console.log).catch(console.error);

Parameters

  • sentence string - Word/sentence to be translated
  • source string (optional) - Word/sentence original language
    Default : auto
  • target string - Language for word/sentence to be translated

Supported languages

The module doesn’t store languages and will always support DeepL’s languages list without required update.

Language format is 2 letters.

They can be found in DeepL’s HTML :

<div class="lmt__language_select__menu" dl-test="translator-source-lang-list" style="left: 119px;">
    <button dl-lang="auto" tabindex="100" dl-test="translator-lang-option-auto">Any language (detect)</button>
    <button dl-lang="EN" tabindex="100" dl-test="translator-lang-option-en-EN">English</button>
    <button dl-lang="DE" tabindex="100" dl-test="translator-lang-option-de-DE">German</button>
    <button dl-lang="FR" tabindex="100" dl-test="translator-lang-option-fr-FR">French</button>
    <button dl-lang="ES" tabindex="100" dl-test="translator-lang-option-es-ES">Spanish</button>
    <button dl-lang="PT" tabindex="100" dl-test="translator-lang-option-pt-PT">Portuguese</button>
    <button dl-lang="IT" tabindex="100" dl-test="translator-lang-option-it-IT">Italian</button>
    <button dl-lang="NL" tabindex="100" dl-test="translator-lang-option-nl-NL">Dutch</button>
    <button dl-lang="PL" tabindex="100" dl-test="translator-lang-option-pl-PL">Polish</button>
    <button dl-lang="RU" tabindex="100" dl-test="translator-lang-option-ru-RU">Russian</button>
    <button dl-lang="JA" tabindex="100" dl-test="translator-lang-option-ja-JA">Japanese</button>
    <button dl-lang="ZH" tabindex="100" dl-test="translator-lang-option-zh-ZH">Chinese</button>
</div>

However, the module is case-insensitive.

Get an array of supported languages using the following code :

getSupportedLanguages().then(console.log);
[
    "auto",
    "en",
    "de",
    "fr",
    "es",
    "pt",
    "it",
    "nl",
    "pl",
    "ru",
    "ja",
    "zh"
]

Quitting

Since this module uses a headless browser, it won’t quit as long as your main script is ronning or until you quit it using the following code :

quit()

Error handling

  • INVALID_SOURCE_LANGUAGE
  • TARGET_LANGUAGE_REQUIRED
  • INVALID_TARGET_LANGUAGE
  • UNSUPPORTED_SOURCE_LANGUAGE
  • UNSUPPORTED_TARGET_LANGUAGE

Examples

Translate from defined language

translate('hello', 'en', 'fr').then(console.log);
{
    "source": {
        "lang": "en",
        "sentence": "hello"
    },
    "target": {
        "lang": "fr",
        "sentences": [
            "Bonjour,",
            "Bonjour",
            "bonjour",
            "hello",
            "h"
        ],
        "translation": "hello"
    }
}

Translate from auto-detected language

(a single word’s language is usually more difficult to detect)

translate('hello', null, 'fr').then(console.log);
{
    "source": {
        "lang": "en",
        "confident": false,
        "sentence": "hello"
    },
    "target": {
        "lang": "fr",
        "sentences": [
            "Bonjour,",
            "Bonjour",
            "bonjour",
            "hello",
            "h"
        ],
        "translation": "hello"
    }
}

(a sentence’s language is usually more efficient)

translate('hey, what\'s up ?', null, 'fr').then(console.log);
{
    "source": {
        "lang": "en",
        "confident": false,
        "sentence": "hey, what's up ?"
    },
    "target": {
        "lang": "fr",
        "sentences": [
            "hey, quoi de neuf ?",
            "Hé, qu'est-ce qu'il y a ?",
            "h",
            "Hé, quoi de neuf ?",
            "Hé, qu'est-ce qu'il y a ?"
        ],
        "translation": "Hé, qu'est-ce qu'il y a ?"
    }
}

Planned features

  • Get word-by-word definitions, quotes and synonyms
  • Bypass maximum text length

FAQ

Why using Puppeteer instead of HTTP requests ?
Fucking rate limit error always showing up, whatever you do.

Changelog

  • 1.0.0 (2019-05-06) • Initial release
  • 1.0.x (2019-05-11)
    1.0.1
    • Added examples title in README.md
    • Fixed Promise never resolving while alt sentences not available
      1.0.2
    • Retry when browser error
      1.0.3
    • Fixed retry
    • Fixed changelog version typo
      1.0.4
    • Close page after translation
  • 1.0.5 (2019-06-27)
    • Added getSupportedLanguages method
    • Improved translation complete detection
    • Fixed cannot launch new browser when previously closed
    • Fixed incompatibility with multi-requests translation (#1)
  • 1.0.6 (2019-07-05)
    • Fixed auto/non-auto parameter
    • Fixed translation complete detection (again)
  • 1.0.7 (2019-07-05)
    • Fixed translation complete detection (again !)
  • 1.0.9 (2020-07-10)
    • Fixed supported languages detection & selection
    • Improved translation completion detection & result parsing
    • Replaced deprecated request dependency by axios dependency
    • Replaced JSDOM dependency by RegExp processing
    • Replaced quit parameter from translate method to new quit method
    • Refactored using async/await