Tag: Tags

Get Instagram Posts by Tag Name

Get Instagram Posts by Tag Name

Instagram has a very robust API which will enable you to integrate with it in just about any way imaginable. However, there’s also some public endpoints that may get you all the information you need. One such public endpoint which we’ll be covering here is the ability to grab Instagram posts by tag name without the need to register for the Instagram API.

Goal

It’s good to have goals. Ours are simple:

  • Retrieve Instagram posts by tag name as JSON using the public endpoint. 

Get Instagram Posts by Tag Name as JSON

If you were to navigate to the following URL, you’d be shown public Instagram posts with the tag name selfiehttps://www.instagram.com/explore/tags/selfie/

This is a nice way to browse all posts by tag, starting with the most recent. Now, try adding ?__a=1 to the end of that same URL: https://www.instagram.com/explore/tags/selfie/?__a=1

Pretty interesting, right? By adding ?__a=1, you’re telling Instagram to return the results as JSON, which provides a universal way for us to digest the information with code. 

Why does Instagram allow this? Great question, if you have the answer I implore you to share in the comments.

Making Sense of the JSON Result

Now that we know how to grab a JSON result of the most recent Instagram posts by tag name, we just need to make sense of it. We’ll start by using an online JSON pretty printer to make the JSON blob a bit friendlier to look at. There’s a lot of data, but we’re going to hone in on the nodes property under tag > media. This nodes property contains a breakdown of actual post data.

Below, I’ve highlighted some of the more interesting properties of the post that we’ll be discussing:

{  
   "tag":{  
      "name":"selfie",
      "content_advisory":null,
      "media":{  
         "nodes":[  
            {  
               "comments_disabled":false,
               "id":"1638021685389222559",
               "dimensions":{  },
               "owner":{  
                  "id":"4736824623"
               },
               "thumbnail_src":"https://scontent-iad3-1.cdninstagram.com/t51.2885-15/e35/c0.10.500.500/22860707_489313221440833_4558795553466482688_n.jpg",
               "thumbnail_resources":[  ],
               "is_video":false,
               "code":"Ba7bAtUAuKf",
               "date":1509487423,
               "display_src":"https://scontent-iad3-1.cdninstagram.com/t51.2885-15/e35/22860707_489313221440833_4558795553466482688_n.jpg",
               "caption":"#girl #style #pretty #blogger #beautiful #beauty #fashion #ootd #makeup #instagood #like4like #cool #moda #trend #adidas #shoes #shopping #goals #selfie #inspiration #outfit #look #fit #luxury #nails #model #followme",
               "comments":{  
                  "count":0
               },
               "likes":{  
                  "count":1
               }
            },
            ...

Referencing the highlighted properties:

  • code can be used to get information about the originator of the post. You may have noticed there’s an owner property, too, but for the sake of this article we’ll focus on the code property to get originator information.
  • date is when the post was submitted, and is represented by the number of milliseconds since the epoch.
  • display_src is the URL to the posted image. You can see the image by pasting that URL in your browser.
  • caption is, well, the caption associated with the post. This particular caption is a massive list of tags, one of which happens to be #selfie!

As I mentioned, the code property can be used to obtain information about the originator of the post. This is accomplished by plugging the value into another public API. Replace the <code> tag below with your code property value, and paste the URL in your browser to obtain another block of JSON:

https://www.instagram.com/p/<code>/?__a=1

We are once again provided with a decent chunk of data. Use the online JSON pretty printer we discussed earlier to format it so it’s easier to follow. Here’s a sample of the output, which I’ve shortened a bit:

{  
   "graphql":{  
      "shortcode_media":{  
         "__typename":"GraphImage",
         "id":"1638021685389222559",
         "shortcode":"Ba7bAtUAuKf",
         ...
         ...
         ...
         "owner":{  
            "id":"4736824623",
            "profile_pic_url":"https://scontent-iad3-1.cdninstagram.com/t51.2885-19/s150x150/18161617_288442401613475_2938897407410176000_a.jpg",
            "username":"andysprite",
            "blocked_by_viewer":false,
            "followed_by_viewer":false,
            "full_name":"Andy",
            "has_blocked_viewer":false,
            "is_private":false,
            "is_unpublished":false,
            "is_verified":false,
            "requested_by_viewer":false
         },
         ...

The post owner’s information is under the graphql > shortcode_media > owner property. From there you can get information such as username and full_name, which I’ve highlighted above.

We now have a fairly straightforward way to:

  • Grab the most recent Instagram posts by tag name as JSON.
  • Identify individual post data, and use that data to obtain additional information such as the poster’s name and username.

This is pretty great, we have a lot to work with here. However, the JSON results are paged which means at this point we’ve only looked at the very first set of results. A bit more investigation into the JSON result shows that paging is built in.

Requesting the Next Page of Results

Looking back at the JSON under the tags > media property, you’ll see another property called page_info (it may be easier if you collapse the nodes property, as that can take a lot of vertical space):

{  
   "tag":{  
      "name":"selfie",
      "content_advisory":null,
      "media":{  
         "nodes":[  ],
         "count":321717673,
         "page_info":{  
            "has_next_page":true,
            "end_cursor":"J0HWfjzPwAAAF0HWfjzOAAAAFiYA"
         }
      },
      "top_posts":{  }
   }
}

The page_info property has its own children, has_next_page and end_cursor. You probably guessed this, but if has_next_page is set to false, we’re at the very end of the results. Otherwise, the end_cursor property can be appended to the original URL we used via the max_id query parameter to retrieve the very next set of results. Here’s an example of what that might look like:

https://www.instagram.com/explore/tags/selfie/?__a=1&max_id=J0HWfjzPwAAAF0HWfjzOAAAAFiYA

That’s all there is to paging the result set.

Conclusion

We discussed using a public API to retrieve Instagram posts by tag name as a JSON result. After some investigation, we were able to understand the structure of this result and identify some of the more exciting properties. We’re now equipped with enough information to consume the JSON result in code.