The library is not working well any more #11

Closed
opened 2021-04-30 19:23:18 +02:00 by youngvo · 9 comments

Hi kaki,

Look like Instagram changed something on their Instagram main site recently. The web scraping mechanims in this library isn't working well any more. Please help to check and apply a patch as soon as possible.

Best,
-Young

Hi kaki, Look like Instagram changed something on their Instagram main site recently. The web scraping mechanims in this library isn't working well any more. Please help to check and apply a patch as soon as possible. Best, -Young

Hello,

All unit tests are passing, please provide more details about the issue you're experiencing.

Hello, All unit tests are passing, please provide more details about the issue you're experiencing.

Hi kaki,

When I called getProfilePostsById

I got error at Insta._getQueryHashs

TypeError: Cannot read property '1' of null

    this.queryHashs = {
      story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1],  // <- this line
      post,
      comment,
      hashtag: hashtagScriptBody.match(localQueryIdRegex)[1],
      location: locationScriptBody.match(localQueryIdRegex)[1],
    };
Hi kaki, When I called `getProfilePostsById` I got error at Insta._getQueryHashs `TypeError: Cannot read property '1' of null` ``` this.queryHashs = { story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1], // <- this line post, comment, hashtag: hashtagScriptBody.match(localQueryIdRegex)[1], location: locationScriptBody.match(localQueryIdRegex)[1], }; ```

I traced IG query hashs.
Getting posts query hash should be come from ConsumerLibCommons.js.

async _getQueryHashs() {
    if (JSON.stringify(this.queryHashs) !== '{}') return this.queryHashs;
    const { Consumer, ConsumerLibCommons, TagPageContainer, LocationPageContainer } = Object.fromEntries(
      [
        ...(await this.self.get('', this.sessionId, false, { __a: undefined })).matchAll(
          /static\/bundles\/.+?\/(.+?)\.js\/.+?\.js/g
        ),
      ].map(_ => _.reverse())
    );
    const consumerScriptBody = await this.self.get(Consumer, undefined, false);
    const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false);
    const [[, firstQueryId], [, secondQueryId]] = [...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)];
    
    const hashtagScriptBody = await this.self.get(TagPageContainer, undefined, false);
    const locationScriptBody = await this.self.get(LocationPageContainer, undefined, false);
    const localQueryIdRegex = /queryId:"([^"]+)"/;

    const [, [, comment], , [, post]] = [...consumerScriptBody.matchAll(/queryId:"([^"]+)"/g)];
    this.queryHashs = {
      // story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1],
      story: '',
      post: firstQueryId,
      comment,
      hashtag: hashtagScriptBody.match(localQueryIdRegex)[1],
      location: locationScriptBody.match(localQueryIdRegex)[1],
    };
    return this.queryHashs;
  }

I can get posts with this change.

I traced IG query hashs. Getting posts query hash should be come from ConsumerLibCommons.js. ``` async _getQueryHashs() { if (JSON.stringify(this.queryHashs) !== '{}') return this.queryHashs; const { Consumer, ConsumerLibCommons, TagPageContainer, LocationPageContainer } = Object.fromEntries( [ ...(await this.self.get('', this.sessionId, false, { __a: undefined })).matchAll( /static\/bundles\/.+?\/(.+?)\.js\/.+?\.js/g ), ].map(_ => _.reverse()) ); const consumerScriptBody = await this.self.get(Consumer, undefined, false); const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false); const [[, firstQueryId], [, secondQueryId]] = [...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)]; const hashtagScriptBody = await this.self.get(TagPageContainer, undefined, false); const locationScriptBody = await this.self.get(LocationPageContainer, undefined, false); const localQueryIdRegex = /queryId:"([^"]+)"/; const [, [, comment], , [, post]] = [...consumerScriptBody.matchAll(/queryId:"([^"]+)"/g)]; this.queryHashs = { // story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1], story: '', post: firstQueryId, comment, hashtag: hashtagScriptBody.match(localQueryIdRegex)[1], location: locationScriptBody.match(localQueryIdRegex)[1], }; return this.queryHashs; } ``` I can get posts with this change.

Could you please make a PR, or, at least, a diff ?

Thanks.

Could you please make a PR, or, at least, a diff ? Thanks.

I didn't fix all problems, only fix post queryhash.

In my case, post queryhash should come from ConsumerLibCommons.

const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false);
const [[, firstQueryId], [, secondQueryId]] =[...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)];

// firstQueryId is used for posts.

And story query hash is still wrong.

(Oh, I used my sessionid as library input)

I didn't fix all problems, only fix post queryhash. In my case, post queryhash should come from ConsumerLibCommons. ``` const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false); const [[, firstQueryId], [, secondQueryId]] =[...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)]; // firstQueryId is used for posts. ``` And story query hash is still wrong. (Oh, I used my sessionid as library input)

Alright.

After fully running unit tests, I was able to reproduce the issue on getProfileStoryById and getProfilePostsById, therefore on getProfileStory and getProfilePosts too.

Like you noticed, it must have something to do with query hashes.

I'll invesigate now.

Thanks for reporting.

Alright. After fully running unit tests, I was able to reproduce the issue on `getProfileStoryById` and `getProfilePostsById`, therefore on `getProfileStory` and `getProfilePosts` too. Like you noticed, it must have something to do with query hashes. I'll invesigate now. Thanks for reporting.
KaKi87 added the
bug
label 2021-05-08 00:53:33 +02:00

getProfilePostsById (and getProfilePosts) is now fixed as per 6a50348.

getProfileStoryById (and getProfileStory) will have to wait for the v2 refactor, because it involves calling an endpoint from a different domain (https://i.instagram.com/api/v1/feed/reels_media/?reel_ids={profileId}) while the lib, in its current state, isn't flexible enough for that.

I'll begin this work asap.

`getProfilePostsById` (and `getProfilePosts`) is now fixed as per 6a50348. `getProfileStoryById` (and `getProfileStory`) will have to wait for the v2 refactor, because it involves calling an endpoint from a different domain (`https://i.instagram.com/api/v1/feed/reels_media/?reel_ids={profileId}`) while the lib, in its current state, isn't flexible enough for that. I'll begin this work asap.

Awesome, was just running into that issue, great to see its resolved, thank you! You should add a public bitcoin address in the description, I'd like to support the repo with a small donation :)

Awesome, was just running into that issue, great to see its resolved, thank you! You should add a public bitcoin address in the description, I'd like to support the repo with a small donation :)

I started working on this last week, I'm hoping to make a pre-release next week.

However, I won't make profit from scraping Instagram, as this activity would switch from illicit to illegal.

Thanks anyway :)

I started working on this last week, I'm hoping to make a pre-release next week. However, I won't make profit from scraping Instagram, as this activity would switch from *illicit* to *illegal*. Thanks anyway :)
This repo is archived. You cannot comment on issues.
No Milestone
No Assignees
4 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: KaKi87/scraper-instagram-v1#11
There is no content yet.