The library is not working well any more

youngvo commented

2021-04-30 19:23:18 +02:00

Hi kaki,

Look like Instagram changed something on their Instagram main site recently. The web scraping mechanims in this library isn't working well any more. Please help to check and apply a patch as soon as possible.

Best,
-Young

Hi kaki, Look like Instagram changed something on their Instagram main site recently. The web scraping mechanims in this library isn't working well any more. Please help to check and apply a patch as soon as possible. Best, -Young

KaKi87 commented

2021-05-01 02:55:38 +02:00

Owner

Hello,

All unit tests are passing, please provide more details about the issue you're experiencing.

Hello, All unit tests are passing, please provide more details about the issue you're experiencing.

poi5305 commented

2021-05-03 05:23:06 +02:00

Hi kaki,

When I called getProfilePostsById

I got error at Insta._getQueryHashs

TypeError: Cannot read property '1' of null

    this.queryHashs = {
      story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1],  // <- this line
      post,
      comment,
      hashtag: hashtagScriptBody.match(localQueryIdRegex)[1],
      location: locationScriptBody.match(localQueryIdRegex)[1],
    };

Hi kaki, When I called `getProfilePostsById` I got error at Insta._getQueryHashs `TypeError: Cannot read property '1' of null` ``` this.queryHashs = { story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1], // <- this line post, comment, hashtag: hashtagScriptBody.match(localQueryIdRegex)[1], location: locationScriptBody.match(localQueryIdRegex)[1], }; ```

👍 1

poi5305 commented

2021-05-03 07:17:00 +02:00

I traced IG query hashs.
Getting posts query hash should be come from ConsumerLibCommons.js.

async _getQueryHashs() {
    if (JSON.stringify(this.queryHashs) !== '{}') return this.queryHashs;
    const { Consumer, ConsumerLibCommons, TagPageContainer, LocationPageContainer } = Object.fromEntries(
      [
        ...(await this.self.get('', this.sessionId, false, { __a: undefined })).matchAll(
          /static\/bundles\/.+?\/(.+?)\.js\/.+?\.js/g
        ),
      ].map(_ => _.reverse())
    );
    const consumerScriptBody = await this.self.get(Consumer, undefined, false);
    const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false);
    const [[, firstQueryId], [, secondQueryId]] = [...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)];
    
    const hashtagScriptBody = await this.self.get(TagPageContainer, undefined, false);
    const locationScriptBody = await this.self.get(LocationPageContainer, undefined, false);
    const localQueryIdRegex = /queryId:"([^"]+)"/;

    const [, [, comment], , [, post]] = [...consumerScriptBody.matchAll(/queryId:"([^"]+)"/g)];
    this.queryHashs = {
      // story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1],
      story: '',
      post: firstQueryId,
      comment,
      hashtag: hashtagScriptBody.match(localQueryIdRegex)[1],
      location: locationScriptBody.match(localQueryIdRegex)[1],
    };
    return this.queryHashs;
  }

I can get posts with this change.

I traced IG query hashs. Getting posts query hash should be come from ConsumerLibCommons.js. ``` async _getQueryHashs() { if (JSON.stringify(this.queryHashs) !== '{}') return this.queryHashs; const { Consumer, ConsumerLibCommons, TagPageContainer, LocationPageContainer } = Object.fromEntries( [ ...(await this.self.get('', this.sessionId, false, { __a: undefined })).matchAll( /static\/bundles\/.+?\/(.+?)\.js\/.+?\.js/g ), ].map(_ => _.reverse()) ); const consumerScriptBody = await this.self.get(Consumer, undefined, false); const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false); const [[, firstQueryId], [, secondQueryId]] = [...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)]; const hashtagScriptBody = await this.self.get(TagPageContainer, undefined, false); const locationScriptBody = await this.self.get(LocationPageContainer, undefined, false); const localQueryIdRegex = /queryId:"([^"]+)"/; const [, [, comment], , [, post]] = [...consumerScriptBody.matchAll(/queryId:"([^"]+)"/g)]; this.queryHashs = { // story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1], story: '', post: firstQueryId, comment, hashtag: hashtagScriptBody.match(localQueryIdRegex)[1], location: locationScriptBody.match(localQueryIdRegex)[1], }; return this.queryHashs; } ``` I can get posts with this change.

KaKi87 commented

2021-05-03 14:54:19 +02:00

Owner

Could you please make a PR, or, at least, a diff ?

Thanks.

Could you please make a PR, or, at least, a diff ? Thanks.

poi5305 commented

2021-05-04 08:26:10 +02:00

I didn't fix all problems, only fix post queryhash.

In my case, post queryhash should come from ConsumerLibCommons.

const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false);
const [[, firstQueryId], [, secondQueryId]] =[...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)];

// firstQueryId is used for posts.

And story query hash is still wrong.

(Oh, I used my sessionid as library input)

I didn't fix all problems, only fix post queryhash. In my case, post queryhash should come from ConsumerLibCommons. ``` const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false); const [[, firstQueryId], [, secondQueryId]] =[...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)]; // firstQueryId is used for posts. ``` And story query hash is still wrong. (Oh, I used my sessionid as library input)

KaKi87 commented

2021-05-08 00:53:25 +02:00

Owner

Alright.

After fully running unit tests, I was able to reproduce the issue on getProfileStoryById and getProfilePostsById, therefore on getProfileStory and getProfilePosts too.

Like you noticed, it must have something to do with query hashes.

I'll invesigate now.

Thanks for reporting.

Alright. After fully running unit tests, I was able to reproduce the issue on `getProfileStoryById` and `getProfilePostsById`, therefore on `getProfileStory` and `getProfilePosts` too. Like you noticed, it must have something to do with query hashes. I'll invesigate now. Thanks for reporting.

KaKi87 added the

bug

label 2021-05-08 00:53:33 +02:00

KaKi87 commented

2021-05-08 01:36:55 +02:00

Owner

getProfilePostsById (and getProfilePosts) is now fixed as per 6a50348.

getProfileStoryById (and getProfileStory) will have to wait for the v2 refactor, because it involves calling an endpoint from a different domain (https://i.instagram.com/api/v1/feed/reels_media/?reel_ids={profileId}) while the lib, in its current state, isn't flexible enough for that.

I'll begin this work asap.

`getProfilePostsById` (and `getProfilePosts`) is now fixed as per 6a50348. `getProfileStoryById` (and `getProfileStory`) will have to wait for the v2 refactor, because it involves calling an endpoint from a different domain (`https://i.instagram.com/api/v1/feed/reels_media/?reel_ids={profileId}`) while the lib, in its current state, isn't flexible enough for that. I'll begin this work asap.

👍 1 🎉 1 🚀 1

dcts commented

2021-05-09 00:22:25 +02:00

Awesome, was just running into that issue, great to see its resolved, thank you! You should add a public bitcoin address in the description, I'd like to support the repo with a small donation :)

KaKi87 commented

2021-07-06 17:57:03 +02:00

Owner

I started working on this last week, I'm hoping to make a pre-release next week.

However, I won't make profit from scraping Instagram, as this activity would switch from illicit to illegal.

Thanks anyway :)

I started working on this last week, I'm hoping to make a pre-release next week. However, I won't make profit from scraping Instagram, as this activity would switch from *illicit* to *illegal*. Thanks anyway :)

KaKi87 referenced this issue

2021-10-04 19:38:11 +02:00

Stopping development & maintenance of the Instagram scraper. #17

KaKi87 closed this issue

2021-10-04 19:38:50 +02:00

Rows
Columns

The library is not working well any more #11