What does your social media say about you?

We all spend a lot of time on social media, but your public persona is probably best represented by your Twitter and perhaps your Instagram feed. At least, that’s true for me. So I decided to build a dashboard that grabs all of my tweets and photos and analyzes them.

In my analysis, we’ll answer the following questions-

  • Am I generally negative or positive on Twitter?
  • What kind of things do I tweet?
  • What kind of things do I take photos of?

I built a super simple Rails App to showcase this. I used AlchemyAPI, a hosted machine learning/AI/NLP company, and a third-party Ruby gem for the analysis and the official Twitter + Instagram gems for pulling in the social media.

Get Some Tweets

We’ll use @sferik’s great Twitter Gem: https://github.com/sferik/twitter.  You will need to grab an API key from Twitter, so head over here: https://apps.twitter.com/.  You’ll want to grab 4 keys-Consumer Key + Secret for the app, and then the Access Token + Secret which will give you full access to your own Twitter feed.

You’ll notice a few implementations in the code of how I go about pulling the twitter information. The first was to just grab the 20 most recent tweets, because this requires no authentication (besides an app key from Twitter). I also implemented the ability to grab all of a user’s tweets, but I was hitting API limits doing it like that so just stuck to search.

  TWITTER_CLIENT.search("from:#{handle}", :result_type => “recent”).take(number)

And that’ll return you an array of 20 tweet objects. This will serve as our main corpus for text.

Am I generally negative or positive on Twitter?

It’s pretty easy. All we need to do is access the AlchemyAPI SentimentAnalysis by posting some text up to it. Under the covers, this is the call the SDK is making: http://www.alchemyapi.com/api/sentiment/textc.html#textsentiment

So we’ll loop through the tweets, send up the text of each tweet, and in response we’ll get a “score” decimal between -1 and 1, along with a “type” that will tell us if it’s positive or negative.

  def tweet_sentiments(handle)
    sentiments = []
    TwitterAPI.get_tweets_of(handle).each do |tweet|
      sentiments << AlchemyAPI.search(:sentiment_analysis, text: '#{tweet.text}')

We’ll then average up the sentiment, and return the average. If it’s more than 0, then we’re positive. Less than 0, negative, and 0 is neutral.  You’ll probably notice that there’s not a lot of differentiation in the results...  The API isn’t amazing with such small tweet samples. Ideally we’d actually filter the tweets and remove bad examples for something better. In fact I only had a couple friends who got significant results.

  def sentiment_of_one_person(handle)
    total_sentiment,sentiments_used = 0,0
    tweet_sentiments(handle).each do |sentiment|
      if sentiment && sentiment["score"] != 0
        total_sentiment += sentiment["score"].to_f
        sentiments_used += 1
    unless sentiments_used == 0 then total_sentiment/sentiments_used else return 0 end

  def tweets_sentiment_average
    sentiment  = sentiment_of_one_person(self.twitter_handle)
    if sentiment > 0
      return "positive"
    elsif sentiment < 0
      return "negative"
      return "neutral"

And there you go-20 lines of code or so and you know your sentiment on Twitter.

What kind of things do I tweet?

This one is a little more vague. I tried it a few different ways to see what kind of responses I’d get. I concatenated the first 20 tweets like so: 

  def concatenated_tweets(handle)
    all_text = ""
    get_tweets_of(handle).each do |text|
      all_text += "#{text.text} "

Now that I have a corpus of text that I can upload to AlchemyAPI in one hit. Luckily all the API calls are the same payload.

I used the following methods to see how they’d work for this.

  • ConceptTagging
  • EntityExtraction
  • Taxonomy
  • KeywordExtraction
  • TextCategorization


All I did here was go through each concept, check for its relevance, and then return a unique array of all the concept names in the corpus.

  def grab_common_concepts(handle=self.twitter_handle)
    concept_array = []
    concepts_response = AlchemyAPI::ConceptTagging.new.search(text: TwitterAPI.concatenated_tweets(handle))
    concepts_response.each do |concept|
      if concept["relevance"].to_f >= 0.60
        concept_array << concept["text"]


The raw response:

 {"text"=>"Patriotism", "relevance"=>"0.906882", "dbpedia"=>"http://dbpedia.org/resource/Patriotism", "freebase"=>"http://rdf.freebase.com/ns/m.06473", "opencyc"=>"http://sw.opencyc.org/concept/Mx4rvVi2CJwpEbGdrcN5Y29ycA"}

Apparently the only thing I tweet about is Patriotism… One "The Interview" tweet and that's the overarching.... OK we move on.


Entity extraction is a little more complicated, because not all entity types have a “disambiguated” version. I checked for the disambiguated, and if it wasn’t there, just returned the name of the entity. Seemed to work OK. I'm trying to get more information on the different possible responses here, as there are TONS of entity types. This worked well enough for what I was trying to do.

  def grab_entities_from_tweets(handle)
    entities = []
    AlchemyAPI::EntityExtraction.new.search(text: ::TwitterAPI.concatenated_tweets(handle)).each do |entity|
      if entity["disambiguated"]
        entities << entity["disambiguated"]["name"]
        entities << entity["text"]


Most of my tweets were just Hashtags + Twitter handles (Redacted), but also quantities which is kinda odd but probably useful for some people.

["Chicago", "Xmas", "Google", "24 hours", "2 weeks", "3 days", "#"]


This one is probably the least useful (the taxonomy is REALLY broad). Good for when you need something like broad categorization, but probably not a typical use case.

  def grab_taxonomy_of_tweets(handle)
    categories = []
    AlchemyAPI::Taxonomy.new.search(text: ::TwitterAPI.concatenated_tweets(handle)).each do |category|
      categories << category["label"]
["/art and entertainment/movies and tv/movies", "/business and industrial/energy/oil/oil company", "/art and entertainment/shows and events/festival"]


This one really doesn’t work well on Tweets. It’s too small of a corpus I think to be relevant. 

  def grab_keywords_of_tweets(handle)
    keywords = []
    AlchemyAPI::KeywordExtraction.new.search(text: ::TwitterAPI.concatenated_tweets(handle)).each do |keyword|
      keywords << keyword["text"] if keyword["relevance"].to_f > 0.6
["moderately funny moments", "Random helpful article", "Google Play RT", "Can’t hurt.", "sikachu FDO   Account", "New blog post", "http://t.co/ySszX2VQTN RT", "default values", "Ruby. http://t.co/xPAgGfLM1a", "worst case", "Hacker News", "old code", "Great post", "little brother", "patriotic duty", "nice thought", "fruit gifts"]


If you had to pick one overarching category… What would it be for your tweets? That’s what this answers. NOTE: This one is no longer listed on the AlchemyAPI page so not sure if it's still supported.

  def grab_tweet_categories(handle)
    AlchemyAPI::TextCategorization.new.search(text: ::TwitterAPI.concatenated_tweets(handle))["category"]

The answer for me? "computer_internet" At least it’s accurate.

What kind of things do I take photos of? http://www.alchemyapi.com/api/image-tagging/urls.html

This one is definitely the coolest. I can grab all of the image URLs from my Instagram and send them up to AlchemyAPI and they’ll tell me what kind of thing I took a picture of. The coolest part was I took a picture of the Napoleon's Tomb, and it correctly categorized that as a Tomb.

The call is pretty simple. Most of the code is actually pulling in the Instagram photos from their API

  def get_instagram_posts(photos=[])
    max_id = ""
      instagram_data = Instagram.user_recent_media(instagram_user_id, :max_id => max_id)
      process_instagram_photos(photos, instagram_data)
      max_id = instagram_data.pagination.next_max_id
    end while max_id

  def process_instagram_photos(photos=[], instagram_data)
    instagram_data.each do |photo|
      photos << Photo.new(photo.images.low_resolution.url, photo.caption.try(:text), photo.location.try(:latitude), photo.location.try(:longitude), photo.created_time)

And then there’s a method on the Photo class that does the actual Image Tagging.

  AlchemyAPI::ImageTagging.new.search(:url => url)

So now all the Photos have tags. To wrap up all of the responses, we take a few steps. First we need to aggregate all the photos. Then we need to filter out the ones without tags.

  def recent_photos_with_tags
    photos_with_tags = []
    recent_photos.each do |photo|
      if photo.image_tags && photo.image_tags.count > 0
        photos_with_tags << photo

Now we have all the photos we want with tags, we can then collect all the tags from the photos.

  def all_photo_tags
    tags = []
    recent_photos_with_tags.each do |photo|
      tags += photo.image_tags

And because there are duplicates, we’ll want to sort these by the most common tags

  def most_common_photo_tags
    tags = Hash.new 0
    all_photo_tags.each do |tag|
      tags[tag["text"]] += 1
    tags.sort_by{|a,b| b}.reverse

Resulting in an array that’s easy for us to display.  This is actually probably the best representation of my internet presence-a picture is worth a thousand words.

[["person", 19], ["sky", 9], ["food", 8], ["dog", 7], ["sign", 7], ["building", 7], ["nature", 6], ["train", 5], ["church", 5], ["sport", 4], ["beach", 3], ["shield", 3], ["route", 3], ["city", 3], ["highway", 3], ["road", 3], ["vatican", 2], ["night", 2], ["graff", 2], ["expressway", 2], ["freeway", 2], ["interstate", 2], ["bridge", 2], ["graduation", 2], ["car", 2], ["sea", 2], ["sunset", 2], ["panda", 1], ["flower", 1], ["aviation", 1], ["airport", 1], ["aircraft", 1], ["plane", 1], ["basilica", 1], ["harbour", 1], ["lake", 1], ["beer", 1], ["river", 1], ["park", 1], ["animal", 1], ["wedding", 1], ["bus", 1], ["swimming", 1], ["dome", 1], ["cathedral", 1], ["mountain", 1], ["freight", 1], ["tree", 1], ["sailing", 1], ["arno", 1], ["agra", 1], ["wine", 1], ["boat", 1], ["ship", 1], ["eiffel", 1], ["tomb", 1]]

It’s kind of awesome. I know it’s accurate since I compulsively Instagram dogs.

I put together an (aggressively) simple app that will let you add your Twitter handle, connect to Instagram, and have all this analysis run and displayed on a dashboard. If you’d like to see the text analysis that didn’t quite work well, I included it in the repo too so you can play with it in the console. It only depends on Redis + SQLite, and is meant to be run locally-my API limits would be quickly reached otherwise.

Here's the repo: https://github.com/scottefein/twitter_analysis_app

Leave a Reply

Your email address will not be published. Required fields are marked *