Are you a soccer expert? If you live in Brazil, chances are you know someone who thinks they are. IBM is providing an opportunity for all 200,000,000 Brazilian soccer experts to actively participate in all the games. How? By leveraging a real-time analytics application for sentiment analysis to understand public opinion for everything from the food in the stadiums to who should be the Man of the Match. Sentiment is analyzed and understood in real time and the application keeps up with second to second changes in attitude with a very small hardware footprint.
Fans benefit because they have a real-time outlet to voice their opinion and weigh in on hot topics like Spain’s unexpected defeat. Coaches also benefit: they are able to check in real time what millions of Brazilians are saying about their teams. How they use this information and what they do with it, is best left to their expertise; maybe the coaches don’t want players to know that the public isn’t optimistic about their chance of victory or perhaps they are fearful of inflating egos with overly zealous fan support.
The analysis performed by the application starts with language simplification around Brazil and other nations in the tournament, soccer and all the players. Then it converts all variations of related verbs used on Twitter and Facebook into a single form: for example, different tenses like run and ran (but in Portuguese of course!). It corrects for common misspellings and also consolidates synonyms for teams and players, such as popular striker nicknames. If you follow the matches, you know keeping up with nicknames isn’t an easy task. A few fun examples: Brazil "Canarinho" ("Little Canary"), South Korea "Taeguk" ("Warriors"), Algeria “Les Fennecs" ("The Desert Foxes"), United States: “The Yanks," Nigeria: “Super Eagles," Bosnia and Herzegovina: "Zmajevi" ("Dragons"), Ivory Coast: Les Éléphants (The Elephants). Also, did you know that this tournament is the year of the meme?
To train the application, IBM Research started with sample social data from the 2013 tournament to help the system learn how words convey positive and negative sentiment about soccer. From the sample, the software developed by IBM Research in Brazil estimated a value of influence for each word as it impacted the whole post. So, when a new post comes in during an actual game, the application simplifies and puts a value on each word to determine its influence on the meaning of the post. Once it establishes the influence of certain words, it can surmise the sentiment of hundreds of thousands of posts.
The application is written in the InfoSphere Streams Processing Language (SPL). In addition, InfoSphere Streams helps to manage and optimize the allocation of computer resources to substantially reduce the compute power required for machine learning algorithms.
The technology is well trained to understand Portuguese and everything about soccer—two topics I personally need help with. Eu preciso estudar Português!
This event isn’t the first time IBM InfoSphere Streams was used to help understand social sentiment. It was used in the 2013 soccer tournament in Brazil and the 2012 US Presidential Elections.
To get started with sentiment analysis, look to InfoSphere Streams, which comes with a text analytics toolkit; you can get started now using the Quick Start Program. Quick Start provides a free, downloadable, non-production version of InfoSphere Streams and this edition is always up to date with the most current version of InfoSphere Streams, which is v3.2.1. There is no data capacity and no time limitation, so you can experiment with large data sets and work with different use cases, on your own timeframe.
Want to learn more? Bookmark the stream computing website, enjoy the games and be careful what you tweet!