The library is not working well any more #11

Closed
opened 2 years ago by youngvo · 9 comments

Hi kaki,

Look like Instagram changed something on their Instagram main site recently. The web scraping mechanims in this library isn't working well any more. Please help to check and apply a patch as soon as possible.

Best,
-Young

Hi kaki, Look like Instagram changed something on their Instagram main site recently. The web scraping mechanims in this library isn't working well any more. Please help to check and apply a patch as soon as possible. Best, -Young
KaKi87 commented 2 years ago
Owner

Hello,

All unit tests are passing, please provide more details about the issue you're experiencing.

Hello, All unit tests are passing, please provide more details about the issue you're experiencing.

Hi kaki,

When I called getProfilePostsById

I got error at Insta._getQueryHashs

TypeError: Cannot read property '1' of null

    this.queryHashs = {
      story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1],  // <- this line
      post,
      comment,
      hashtag: hashtagScriptBody.match(localQueryIdRegex)[1],
      location: locationScriptBody.match(localQueryIdRegex)[1],
    };
Hi kaki, When I called `getProfilePostsById` I got error at Insta._getQueryHashs `TypeError: Cannot read property '1' of null` ``` this.queryHashs = { story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1], // <- this line post, comment, hashtag: hashtagScriptBody.match(localQueryIdRegex)[1], location: locationScriptBody.match(localQueryIdRegex)[1], }; ```

I traced IG query hashs.
Getting posts query hash should be come from ConsumerLibCommons.js.

async _getQueryHashs() {
    if (JSON.stringify(this.queryHashs) !== '{}') return this.queryHashs;
    const { Consumer, ConsumerLibCommons, TagPageContainer, LocationPageContainer } = Object.fromEntries(
      [
        ...(await this.self.get('', this.sessionId, false, { __a: undefined })).matchAll(
          /static\/bundles\/.+?\/(.+?)\.js\/.+?\.js/g
        ),
      ].map(_ => _.reverse())
    );
    const consumerScriptBody = await this.self.get(Consumer, undefined, false);
    const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false);
    const [[, firstQueryId], [, secondQueryId]] = [...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)];
    
    const hashtagScriptBody = await this.self.get(TagPageContainer, undefined, false);
    const locationScriptBody = await this.self.get(LocationPageContainer, undefined, false);
    const localQueryIdRegex = /queryId:"([^"]+)"/;

    const [, [, comment], , [, post]] = [...consumerScriptBody.matchAll(/queryId:"([^"]+)"/g)];
    this.queryHashs = {
      // story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1],
      story: '',
      post: firstQueryId,
      comment,
      hashtag: hashtagScriptBody.match(localQueryIdRegex)[1],
      location: locationScriptBody.match(localQueryIdRegex)[1],
    };
    return this.queryHashs;
  }

I can get posts with this change.

I traced IG query hashs. Getting posts query hash should be come from ConsumerLibCommons.js. ``` async _getQueryHashs() { if (JSON.stringify(this.queryHashs) !== '{}') return this.queryHashs; const { Consumer, ConsumerLibCommons, TagPageContainer, LocationPageContainer } = Object.fromEntries( [ ...(await this.self.get('', this.sessionId, false, { __a: undefined })).matchAll( /static\/bundles\/.+?\/(.+?)\.js\/.+?\.js/g ), ].map(_ => _.reverse()) ); const consumerScriptBody = await this.self.get(Consumer, undefined, false); const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false); const [[, firstQueryId], [, secondQueryId]] = [...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)]; const hashtagScriptBody = await this.self.get(TagPageContainer, undefined, false); const locationScriptBody = await this.self.get(LocationPageContainer, undefined, false); const localQueryIdRegex = /queryId:"([^"]+)"/; const [, [, comment], , [, post]] = [...consumerScriptBody.matchAll(/queryId:"([^"]+)"/g)]; this.queryHashs = { // story: mainScriptBody.match(/50,[a-zA-Z]="([^"]+)",/)[1], story: '', post: firstQueryId, comment, hashtag: hashtagScriptBody.match(localQueryIdRegex)[1], location: locationScriptBody.match(localQueryIdRegex)[1], }; return this.queryHashs; } ``` I can get posts with this change.
KaKi87 commented 2 years ago
Owner

Could you please make a PR, or, at least, a diff ?

Thanks.

Could you please make a PR, or, at least, a diff ? Thanks.

I didn't fix all problems, only fix post queryhash.

In my case, post queryhash should come from ConsumerLibCommons.

const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false);
const [[, firstQueryId], [, secondQueryId]] =[...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)];

// firstQueryId is used for posts.

And story query hash is still wrong.

(Oh, I used my sessionid as library input)

I didn't fix all problems, only fix post queryhash. In my case, post queryhash should come from ConsumerLibCommons. ``` const consumerLibCommonsBody = await this.self.get(ConsumerLibCommons, undefined, false); const [[, firstQueryId], [, secondQueryId]] =[...consumerLibCommonsBody.matchAll(/queryId:"([^"]+)"/g)]; // firstQueryId is used for posts. ``` And story query hash is still wrong. (Oh, I used my sessionid as library input)
KaKi87 commented 2 years ago
Owner

Alright.

After fully running unit tests, I was able to reproduce the issue on getProfileStoryById and getProfilePostsById, therefore on getProfileStory and getProfilePosts too.

Like you noticed, it must have something to do with query hashes.

I'll invesigate now.

Thanks for reporting.

Alright. After fully running unit tests, I was able to reproduce the issue on `getProfileStoryById` and `getProfilePostsById`, therefore on `getProfileStory` and `getProfilePosts` too. Like you noticed, it must have something to do with query hashes. I'll invesigate now. Thanks for reporting.
KaKi87 added the
bug
label 2 years ago
KaKi87 commented 2 years ago
Owner

getProfilePostsById (and getProfilePosts) is now fixed as per 6a50348.

getProfileStoryById (and getProfileStory) will have to wait for the v2 refactor, because it involves calling an endpoint from a different domain (https://i.instagram.com/api/v1/feed/reels_media/?reel_ids={profileId}) while the lib, in its current state, isn't flexible enough for that.

I'll begin this work asap.

`getProfilePostsById` (and `getProfilePosts`) is now fixed as per 6a50348. `getProfileStoryById` (and `getProfileStory`) will have to wait for the v2 refactor, because it involves calling an endpoint from a different domain (`https://i.instagram.com/api/v1/feed/reels_media/?reel_ids={profileId}`) while the lib, in its current state, isn't flexible enough for that. I'll begin this work asap.
dcts commented 2 years ago

Awesome, was just running into that issue, great to see its resolved, thank you! You should add a public bitcoin address in the description, I'd like to support the repo with a small donation :)

Awesome, was just running into that issue, great to see its resolved, thank you! You should add a public bitcoin address in the description, I'd like to support the repo with a small donation :)
KaKi87 commented 2 years ago
Owner

I started working on this last week, I'm hoping to make a pre-release next week.

However, I won't make profit from scraping Instagram, as this activity would switch from illicit to illegal.

Thanks anyway :)

I started working on this last week, I'm hoping to make a pre-release next week. However, I won't make profit from scraping Instagram, as this activity would switch from *illicit* to *illegal*. Thanks anyway :)
KaKi87 closed this issue 2 years ago
This repo is archived. You cannot comment on issues.
No Milestone
No Assignees
4 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: KaKi87/scraper-instagram-v1#11
Loading…
There is no content yet.