Supervised Sentiment Analysis of Twitter Handle of President Trump with Data Visualization Technique Kalyan Sahu kaysahu@csu.fullerton.edu 0 Yu Bai ybai@fullerton.edu 0 Yoonsuk Choi yochoi@fullerton.edu 0 Computer Engineering Program, California State University , Fullerton, Fullerton, California , USA

-The approval rating of the President of the United States (POTUS) can be used to gauge public support for the current administration. Public support, in turn, is highly dynamic and erratic, and can be influenced by current events, including political/economic policy announcements, ongoing scandals, treatment in the media, and general propaganda. The current POTUS, Donald Trump, is unique because, for the first time, the office of POTUS has direct access to social media platforms that offer a direct avenue of communication with the general population. Can the social media presence of POTUS influence the public approval of the administration? In this paper, we analyze the relationship between tweets generated by POTUS and his approval rating using sentiment-analytics and data visualization tools. The twitter feed of POTUS is mined, cleaned, and given a quantitative measure based on word content, called the “sentiment score". By comparing tweets before the election, between election and inauguration, and after the inauguration, we find that the sentiment score for Mr. Trump's feed has increased on average with time by a factor of 60%. Using cross-correlation analysis, we find a preliminary causative relationship between POTUS twitter activity and approval rating. The findings provide a new perspective on the forces that influences public opinion, and the general strategy can be used for other analyses in the future. Index Terms-Sentiment Analysis, Approval Rate, Application Program Interface, Twitter, Supervised and Unsupervised Sentiment Analysis we describe the background research and motivation for such a study. In Section III, we describe the methods involved in extracting and distilling information from President Trump's twitter account, and then using sentiment analysis to quantify the nature of each tweet. In Section IV, we use standard tools in statistical analysis to investigate whether there is any discernible relationship between the "sentiment score" of President Trump's twitter feed and his approval rating. We discuss the consequences of our findings and propose future work in Section V, and conclude in Section VI.

-

Mass opinion plays an important role in a democracy, and can be influenced by modern tools like Twitter micro-blogging and Facebook postings, which connects the leaders directly with the voting citizens [ 2 ]. It is important to investigate the role of new forms of communication as the cause or the effect of processes, such as the personalisation of leadership, verticalization of political organisations, the presidentialisation of political parties, or the social de-legitimation of the old “intermediate bodies" [ 3 ]. Quantitative analysis of Twitter and Facebook material would provide deeper insights of the factors that citizens care most about and base their opinion about important issues of governance [ 4 ]. A number of tools, such as “Sentiment Analysis" (SA), have been developed during the past decade to analyze the texts of micro-blogs to measure their content leading to decisions by the population for which they are targeted. However, the success of the results derived using these methods vary widely [ 5 ].Supervised sentiment analysis method involves opinion mining using an algorithm on a given data set and verifying the experimental result against the theoretical outcome. Unsupervised sentiment analysis deals with unexpected results using the same training data and algorithm. The relation between the Twitter Feeds of President of The United States (POTUS) Donald Trump and results of polling about the state of the nation by various media outlet might provide direct validation of the SA tools.

The poll results of (a) presidential approval rate and (b) direction of the country reflect the opinion of the mass or at least voting citizens that may be largely influenced by the head of the nation. In the United States, these poll results, which are collected on a weekly basis, may also be affected by major world events and verbal communications of the President himself. We plan to investigate the mass opinion influenced by the presidential Twitter and contemporary major events.

The goal of the paper is to determine the effect of (a) the twitter handle of POTUS and (b) contemporary news outlets on two major poll results: (1) direction of the country, and (2) approval rate of the President. We shall perform unbiased research using the tweets made by POTUS that affect his approval rating compared to contemporary major events.

We plan to establish a clear relationship between the Twitter Feeds of POTUS and the poll results. Any deviation from a linear relationship would be attributed to the influence of the free press. The relationship between the POTUS twitter micro-blogs and approval ratings can give us keys insights into how direct communication with the voting public can affect public opinion in a democracy. Examining the effect of media reports of major world events would help in understanding the importance of the free press, which is often referred to as one of the important pillars of democracy.

Our hypothesis is based on the fact that President Donald Trump is a master communicator, who has cultivated a mass following through communicating his ideas with short and provocative messages. This is evidenced by the fact that he uses the Twitter platform, which is suitable for such messaging, extensively to engage his followers [ 1 ]. With the usage of Twitter, POTUS maintains the support of his base. The fluctuation in the base support in terms of increase and decrease of poll favorably happens due to the major world events as reported by the press. Our research questions (RQ) are the following. • RQ 1: Is there a linear relationship between the Twitter

Feeds of President Donald Trump and the approval rating? • RQ 2: Which are the world events that influenced the popularity of POTUS significantly? • RQ 3: To what extent the technique like Supervised Sentiment Analysis can determine the effect of POTUS Twitter Feeds on shaping mass opinion.

President Donald Trump and several other world leaders use micro-blogging tools such as Twitter to convey their message to their mass followings. As internet speed increases and access to the smart devices by the masses increases, this mode of communication would become a key factor in deciding the governance of nations [ 6 ] [ 7 ] . Availability of tools to quantify the sentiment content in Twitter Feed, media reporting and systematic poll results provides a unique opportunity to investigate their mutual relationship. Our study, the first step, would focus on validating the tool “Supervised Sentiment Analysis" for finding relationship between micro-blogs and opinion polls. As in many instances, these contents can be manipulated by artificial means such as “false or fake news" or external hacking; thus, our study would play a significant role in identifying the tools for understanding mass communication by leaders shaping the opinion. It is relevant now especially because in the near future large democracies are going to rely on the approval rating system.

The Presidential Twitter feed and the opinion poll results are readily available from open sources. Hence, there is no barrier in collecting the data. We plan to use Opinion Mining (OM) using sentiment analysis. However, the results of the sentiment analysis critically depends on the careful choice of the terms or words that are used in the micro-blogs to quantify their influence the targeted audience. Also, the results depend on the algorithm that would be used in sentiment analysis. Due to time and resource constraints, we plan to use only commonly used algorithms for this project. Given these limitations, the results of this analysis have limited value.

Our assumptions are based on the studies that develop networks in social media, where it is observed that the social media network expands as a result of homophily and popularity [ 8 ]. In this investigation, we assume that the Twitter micro-blogs of the POTUS influence equally to all voters who participate in the polls, irrespective of their socio-economy and educational status. The second assumption is that few important keywords that would be used in OM is sufficient for the SA. The scope of the study is limited to only micro-blogging of President Donald Trump, as it depends on the combination of the words used for communication. The conclusions may not be universal whereby if a similar similar analysis of Twitter Feeds of another leader may not lead to the same result. We are focusing only post election period of President Trump‘s Twitter Feeds. In order to determine a pattern of relationship of presidential blogs and the national poll results, it may be necessary to carry out analysis of multiple head of states in USA and abroad, which is beyond the scope of this work.

III. Methodology

It is self-evident that sentiments in politics has always played a critical role, as evidenced by the recent mega events such as the U.S. Presidential elections and historical campaigns like Brexit polling. Studies have shown that sentiments can be manipulated using technology, resulting in new terms such as techno-populism, which can be considered to be one of the basic elements of the new trends of depoliticisation [ 4 ] [ 3 ] . The methodology depends upon two sets of time series data and analysis of them to find any relationship. In this section, we shall describe the specific steps adopted for data analytics.

A. Selection of Tool

This project aims at exploring the validity of a simple hypothesis that the presidential tweets influence major national polls significantly. In order to perform this analysis, data mining from text strings and their correlations with numerical data from polling agencies are needed. We decided to use one of the most powerful techniques in the field, namely ‘Sentimental Analysis (SA)’. Sentiment analysis is used to graphically represent selected human sentiment using major keywords that associate with similar sounding words in any given public internet comment/tweet/feedback. Over the past few decades, corporate giants like Amazon and Google, along with many influential politicians, have adopted the notion of feedback from their customers, clients and fellow citizens. Twitter data have been successfully used to determine the role of fake media stories in recent presidential elections [ 9 ] [ 10 ]. There have been several reviews of tools for analytical mapping of opinion mining and sentiment analysis research that are used for this literature survey [ 11 ].

Sentiment analysis is a fast growing research area in computer science, making it hard to perform a comprehensive literature survey that would encompass all developments and findings in the area. In recent times, using the Web 2.0, consumers express and share their opinions regarding day-to-day activities and global issues on social media. and this provides a transparent platform to share views across the world. Methods have been developed to measure mass sentiments using these electronic Word of Mouth (eWOM) [ 12 ]. Guided by the method described in Mantlys, Graziotin and Kuutila [ 13 ], we have used Google Scholar and taxonomy of research topics to select publications related to the use of SA in social media texts from twitter and Facebook micro-blog texts. Success of quantitative and qualitative analysis of sentiments and latent factors using online review of doctors based on a regressive analysis show strong correlations of state-level measures of quality healthcare with patient likelihood of visiting their primary care physician within 14 days of discharge [ 5 ]. Political organizations around the world have been increasingly using Twitter micro-blogging tools that give platform for divergence of political discourse compared to traditional media [ 2 ]. The SA tools are used to study the consumer satisfaction in various fields such as medical visits, airline reservations and hotel industry [ 5 ], [ 14 ]. While these are confined to a section of the general population, it is not clear from these studies whether the SA tools can quantify the opinion of a wider population and relate them between two independently measured quantities namely, sentiments derived from Twitter micro-blogs and national polls. This project aims at examining the relationship between them using the SA tools. The intellectual merit of the project is to understand the mass following in a society by the leader who can effectively communicate using modern social media. The broader perspective is to determine the specific SA tools that can yield useful results in the field of capturing mass opinion that can be related to independently collected national polls.

B. Data Collection

• The first set of data consists of time series of Twitter micro-blogs posted daily by President Trump. The second data set comprises of daily presidential approval ratings.

TABLE I Tweet Count Distribution Sr.No. Source Tweets Retweets 1 All Devices 9438 10645 2 iPhone 7298 8456 3 Android 1478 1479 • Twitter micro-blog data is collected from an archive website which organizes daily postings by date, time and device. The website “https://www.trumptwitterarchive.com" allows users to export data in CSV or JSON format. It can display retweets, permits word search and retweet count. The Table 1 gives the tweet count distribution over the period of this study. • It is clear from Table 2 that over the past three years, President Trump micro-blogging post have significantly transitioned from android system to iPhone. Hence, the source data used in this research is taken from iPhone devices only. The two types of data for this study is collected from public sources between January 20,2016 and January 1, 2019.

The Real Clear Politics team was kind enough to provide with average approval score in percentage data file for Trump going back to the first date. The approval ratings cross checked with other data sources collected from the following websites: (1) https://www.realclearpolitics.com, (2) https://www.projects.fivethirtyeight.com/trumpapproval-ratings

C. Data Analysis

The data analysis was carried out, following the protocol described below, using Python development language. Here, we describe the role of each step in the analysis process. Extraction of relevant data is possible even without sufficient know-how of Natural Language Processing(NLP). The complexity of code involved in this project is benign and execution is performed using Python kernel in Jupyter notebook. Pandas library was first imported for reading and writing purposes. The source CSV file which uses comma delimiter is read using the read_cv command.

Fig. 2. President Trump’s approval rate time series. This is divided into three periods. The time before the dotted line represents his approval rating prepresidency whereas the solid line represents post-presidency approval rating. The time between is the interim period where is yet to take an oath for Presidency but has been elected as President of the United States Of America. • Number of words : The word length of each tweet is calculated using the split function in Python programming language. • Number of characters : The total number of characters are calculated based on the previous logic of sentiment score. Spaces are also included. • Average word length : The sum of the length of all the words when divided by the total length of the tweet. • Number of stopwords : ‘Stopwords’ is calculated in order to understand the extent information which will be lost after processing the data. Natural Language Toolkit(NLTK) is a basic NLP library in python, which is used for this calculation. • Number of special characters : Special characters like hashtags represent extra information in a tweet and are undesirable during data analytics. ‘Starts with’ function is invoked since hashtags (or mention) generally appear at the beginning of a word. • Number of numeric : The method involved for the total count of numeric is similar to that of word count method. It is calculated using digit() function which lists all numeric present in a tweet. • Number of uppercase words : Word such as ‘hate’ which expresses anger and rage, is often commented in uppercase letters. Thus, converting uppercase words to lowercase is pertinent before advancing to text processing. of the same word. ‘Text’ and ‘text’ is the same word but during the word count step, they are considered as two different individual words. • Punctuation removal : Punctuations do not supply any extra information during the data analytics process. Hence, cleaning up such instances allows to reduce the size of the training data. • Stopwords removal : Commonly occuring words or stop words are removed from the source text as it creates undesirable redundancy in the training data. NLTK predefined library ‘stopwords’ is imported which consists of all the stopwords present in English language. • Frequent words removal : After detecting commonly occurring words in a general sense, it is important to remove these words from the text data. The top 15 most frequently occurring words in the training data is found, these words are removed. • Rare words removal : This method is similar to the common word removal process where rarely occurring words are located and removed. Since these words are very scarce, noise generally dominates the association between them and other words. Rare words with a more general form will have higher count and are calculated using value_count() function.[ 25 ] • Spelling correction : Twitter users often hastily post online which leads to profusion of spelling mistakes. This also creates superfluous copies of similar words. For example, “Analytics" and “analytics" is treated as two different words even though they are used in a similar tone. TextBlob natural language processing library is referred in this step to achieve spelling correction. • Tokenization : It is defined as the process of chopping up a piece of text into a sequence of individual words or sentences. TextBlob NLP library is highly efficient for performing this activity with a large dataset.[ 25 ] • Stemming : This method involves in suffice removal like ‘ing’,‘ly’ etc. using a simple rule-based approach. PorterStemmer is imported from the NLTK library and used with stem() and split() functions. • Lemmatization : The process of lemmatization is a finer method when compared to stemming. In this scenario, a word is converted into its root word instead of stripping the suffices. It accounts for vocabulary and executes morphological analysis to procure the root word. ‘Word’ sub-library is imported from TextBlob and lemmatize() function is used along with the lambda() function to perform this task.

D. Basic Text Pre-processing of text data

Data cleansing protocol is crucial for accurate text and feature extraction. It allows in reduction of vocabulary clutter which will play an important role during advanced text processing. This is achieved by following the basic pre-processing steps on the training data.

• Lower case : All the tweets are transformed into lower case letters in order to avoid storage of multiple copies

E. Advance Text Processing

All the pre-processing steps involved in data modification and cleansing are now complete. The final step involves extracting features from the refined text using NLP techniques. • N-grams : N-grams refers to combination of multiple words which are used together. For N=1, n-grams is called ‘Unigrams’ whereas ‘Bigrams’ and ‘Trigrams’ represent N=2 and N=3 respectively. Unlike Bigrams and Trigrams, Fig. 3. Word cloud after successful data cleaning. Nonessential words which represent noise are removed. This shifts the focus on the frequently occurring words which are pertinent towards sentiment analysis.

Unigrams do not consist of significant information. The n-grams principle is based on capturing the language structure i.e. what letter or word is supposed to follow a given one. The quantity and quality of context depends upon the length of the n-grams. TextBlob provides a ngrams() function which is deployed for data extraction from the twitter data module.[ 25 ] • Term Frequency : Term frequency is the ratio of word count present in a sentence to the length of the sentence. value _counts() function is called upon to perform this task. TF = (Number of times term T appears in the particular row) / (number of terms in that row) • Inverse Document Frequency : IDF is used to calculate the uniqueness of each word. It is the log of the ratio of the total number of rows to the number of rows in which the word is present. IDF = log(N/n), where N is the total number of rows and n equals to the number of rows in which the word exists.[ 25 ] • Term Frequency-Inverse Document Frequency (TF-IDF) : TF-IDF is the multiplication of the TF and IDF values. In order to achieve the TF-IDF value without calculating TF and IDF values, sklearn function is used. This function can also perform simple preprocessing tasks like lowercasing and removal of stopwords. • Bag of Words : BoW model is a simplified representation where a text (a sentence or a document) is represented in a multiset (bag) of words. Similar text fields should contain similar kinds of words and hence, will have a similar bag of words. Sklearn provides a separate function for this task called ‘CountVectorizer’. • Sentiment Analysis : Our original problem is to calculate the sentiment score for each tweet published by the @RealDonaldTrump twitter handle. TextBlob is used to calculate the polarity and subjectivity of each twitter feed The analytical methodology will provide the values for polarity and subjectivity. The sentiment score of each tweet and the corresponding published date is exported into a CSV file. The flat file is then imported as a table shown in Query 1 in Microsoft SQL Server named ‘SentimentScore’. A select query is fired to calculate the sum of sentiment scores for each day (Query 2). Approval rate is plotted on the Y-axis against sentiment score in the X-axis to achieve the final graph using Microsoft Excel charts [dbo].[SentimentScore] ( [SL NO] varchar(50), [TEXT] varchar(50), [CREATED_AT] varchar(50), [WORD_COUNT] varchar(50), [CHAR_COUNT] varchar(50), [AVG_WORD] varchar(50), [STOPWORDS] varchar(50), [HASHTAGS] varchar(50), [NUMERICS] varchar(50), [UPPER] varchar(50), [SENTIMENT] varchar(50) )

Query 1: Schema of SentimentScore table SELECT SUM(SENTIMENT) AS ’SA’ , CAST (CREATED_AT AS DATE) DateAdded FROM DATABASE.TABLE_NAME GROUP BY CAST(CREATED_AT AS DATE) ORDER BY CAST(CREATED_AT AS DATE)

Query 2: SQL code used for daily sentiment summation calculation

The absolute value of the approval rating, A(t), and the twitter sentiment score, S(t), can give insights into how news events can influence both metrics. To analyze the relationship between the two, we computed the cross correlation between the two time series at multiple lags l using the standard expression: S · A(l) = hS(t), A(t − l)i, where hx, yi represents the correlation coefficient between the data x and y. • Approval rate time series graph is created in Fig. 2 using the data collected from Real Clear Politics. • A Twitter sentiment time series is plotted in Fig. 4 where sentiment score of President Trump‘s microblog feed is scaled from 2016 to 2019 • Approval rate is mapped against sentiment score in Fig.

5. • The cross-correlation plot where in the X-axis lag is accounted for i.e. it denotes a delay in approval rating change using a number of days count. The Y-axis represents the normalized value. (Fig. 6) • Epoch plots are created for President Trump‘s approval rating and twitter activity sentiment score before the 2016 presidential election, during the interim period where he was elected but yet to take office and after he made the transition to the Oval office. Sentiment analysis is classified into two categories : • Polarity is a float value within the range [-1.0 to 1.0] where 0 indicates neutral, +1 indicates a very positive sentiment and -1 represents a very negative sentiment. • Subjectivity, on the other hand, is a float value within the range [0.0 to 1.0] where 0.0 is very objective and 1.0 is very subjective. Subjective sentence expresses some personal feelings, views, beliefs, opinions, allegations, desires, beliefs, suspicions, and speculations where as objective sentences are factual.

The primary purpose of this study is to understand the impact of President Trump‘s daily twitter feed on the Presidential approval rating. The data collected only consists of the tweets posted by iPhone device since his presidential term began. Retweets and other sources have not been taken into consideration because the intention is to understand the relationship between President‘s published twitter posts using the device, he uses most commonly and the impact it has on his popularity.

The graphs developed from the sentiment analysis has the following implications : • Sentiment score for Donald Trump‘s twitter activity before he took office is averaged at 0.25986. After he was elected President of the United States of America, the average sentiment score increased by 60 percent to 0.556785. This increment clearly states that President Trump‘s tweets have more influential words since he took over the Oval office. The epoch plot and data used here in Fig 7(b) was used to determine this result. • The plot in Fig 5 is unevenly distributed and leads to the conclusion that there is no direct correlation between the Presidentâs twitter activity and his approval rating. • However, after analyzing the results of cross-correlation plot, it is evident that approval rating of President Donald Trump is subjected to change based on his twitter activity. It takes five days on an average for the approval rating to be affected by any significant amount of twitter activity. The cross-correlation plot is hits the peak around the five day limit.(Fig. 6)

One possible explanation for this result revolves around the fact that news takes time to spread. This change is not instantaneous but rather gradual as the country will first be exposed to this news, then the people will digest and discuss amongst themselves which ends up shaping a common opinion within different groups of voters. This study has used data mining and analytical technique to verify the direct impact of social media platform on poll data. For future discussion, the increase of sentiment score in the post presidency period may be verified if such analysis is carried for the next one and a half years.

VI. Conclusion

Sentiment analysis is a very effective method which allows us to detect patterns and trends related to data. Python developer coding language is an excellent platform for data mining, processing and analytics. Python‘s natural language processor library like TextBlob and tool kits have proven to be pivotal for this research. ‘Bag of words" algorithm was vital towards this project and is to implement in any natural language processing techniques. Key conclusions of this project :

• There is no immediate impact on the President‘s job approval/popularity rating due to his Twitter activity. • A five day lag is found using cross correlation between any change in the approval rating of the President with respect to his twitter activity. In future studies, we plan to establish causality using more statistically robust methods, such as Grange causality. This lag can be accounted for the fact that news takes time to spread. The approval rate will not automatically be affected the moment President Trump decides to tweet. It will be collected and discussed amongst various political and media outlets, resulting in a likely change of opinion which ideally can take up to five days. • Mass political opinion carries much more complexity when compared to regular consumer products. Hence, understanding this opinion can be more exigent than a simple product review. The scope of this study can be expanded to include other data sources such as newspapers and visual media reports. Using multiple data sources would permit the investigation of mass political opinion in greater detail.

[1] B. Anderson , Tweeter-in-Chief: A Content Analysis of President Trump's Tweeting Habits , Elon Journal of Undergraduate Research in Communications , Vol. 8 , No. 2 , pp. 36 - 47 , 2017 . [2] J. Ausserhofer , A. Maireder , National Politics on Twitter, Information, Communication Society, 16 : 3 , 291 - 314 , DOI: 10.1080/1369118X. 2012 . 756050 , 2013 . [3] E. De Blasio , M. Sorice , "Populism between direct democracy and the technological myth" , Nature Scientific Report , 4 , 1 - 10 , doi.org/10.1057/s41599-018-0067-y, 2018 . [4] L. McCay-Peet , A. Quan-Hasse , A Model of Social Media Engagement: User Profiles, Gratifications, and Experiences. Why Engagement Matters: Cross-disciplinary perspectives of user engagement in digital media , 199 - 217 . doi: 10 .1007/978-3- 319 -27446- 1 _ 9 , 2016 . [5] W. Byron , P. Michael , S. Urmimala , T. Thomas, D. Mark , "A largescale quantitative analysis of latent factors and sentiment in online doctor reviews" , Journal of the American Medical Informatics Association , 1 - 5 , doi.org/10.1136/amiajnl-2014 -002711 , 2014 . [6] S. Andrews , DA. Ellis, H. Shaw , L. Piwek, Beyond Self-Report: Tools to Compare Estimated and Real-World Smartphone Use . PLoS ONE 10 ( 10 ): e0139004. doi: 10 .1371/journal. Pone. 0139004 , 2015 . [7] A. Muhmmad , Y. Alas , Smartphones habits, necessities, and big data challenges , The Journal of High Technology Management Research , 26 , 177 - 185 , doi.org/10.1016/j.hitech. 2015 . 09 .005, 2015 . [8] Y. Liu , L. Li , H. Wang , C. Sun X. Chen„ J. He , H. Jiang , The Competition of Homophily and Popularity in Growing and Evolving Social Networks , Scientific Report, 8 , 1 - 10 , doi. org/10.1038/s41598- 018-33409-8 , 2018 . [9] A. Bovet , H. Makse , "Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump , Nature Communication" , 10 , 1 - 10 , doi.org/10.1038/s41598-018-26951-y, 2018 . [10] A. Bovet , H. Makse , "Influence of fake news in Twitter during the 2016 US presidential election , Nature Communication" , 10 , 1 - 15 , doi. org/10.1038/s41467-018-07761-2 , 2019 . [11] R. Piryani , D. Madhavi , V.K. Singh , Analytical mapping of opinion mining and sentiment analysis research during 2000â2015 , Science Direct , 122 - 150 doi.org/10.1016/j.ipm. 2016 . 07 .001, 2016 . [12] K. Ravi , V. Ravi , A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , Knowledge-Based Systems , 89 , 14 - 46 , http://dx.doi.org/10.1016/j.knosys. 2015 . 06 .015, 2015 . [13] M. V. Mantyla , N. Graziotin , M. Kuutila , The evolution of sentiment analysis"A review of research topics", venues, and top cited papers , Computer Science Review, 27 , 16 - 32 , https://doi.org/10.1016/j.cosrev. 2017 . 10 .002, 2018 . [14] Y. Hu , Y. Chen , H. Chou , "Opinion mining from online hotel reviews " text summarization approach" , Information Processing Management , 53 , 436 - 449 , http://dx.doi.org/10.1016/j.ipm. 2016 . 12 .00, 2016 . [15] E. Cambria , "Affective Computing and Sentiment Analysis" , IEEE Intelligent Systems , 31 , 102 - 107 , https://doi.org/10.1109/MIS. 2016 . 31 , 2016 . [16] T. Maite , B. Julian , T. Milan , V. Kimberly , S. Manfred, LexiconBased Methods for Sentiment Analysis , The MIT Press Journals, doi.org/10.1162/COLI_a _ 00049 , 2011 . [17] P. Goncalves , B. Fabricio , A. Matheus , C. Meeyoung , "iFeel: A Web System that Compares and Combines Sentiment Analysis Methods" , http://dx.doi.org/10.1145/2567948.2577013, 2014 . [18] P. Goncalves , B. Fabricio , A. Matheus , C. Meeyoung , "Comparing and Combining Sentiment Analysis Methods" , Proceeding COSN '13 Proceedings of the first ACM conference on Online social networks , 27 - 38 , http://dx.doi.org/10.1145/2512938.2512951, 2013 . [19] F. Nielsen , A new ANEW: Evaluation of a word list for sentiment analysis in microblogs , CoRR , 2011 . [20] T. O'Keefe , I. Koprinska , Feature Selection and Weighting Methods in Sentiment Analysis , Proceedings of the 14th Australasian Document Computing Symposium, Sydney, Australia, 4 December 2009 , 67 - 74 , 2009 . [21] E. Kouloumpis , T. Wilson, J. Moore , "Twitter Sentiment Analysis: The Good the Bad and the OMG" , Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media , 1 - 4 , 2011 . [22] F. Ribeiro , M. Araujo , P. Goncalves , M. A. Goncalves , F. Benevenuto , SentiBench a benchmark comparison of state-of-the-practice sentiment analysis methods , Science , 1 - 29 , doi.org/10.1140/epjds/s13688- 016 -0085-1 , 2016 . [23] H. Saif , Y. He , H. Alani , Semantic Sentiment Analysis of Twitter . In: Cudre-Mauroux P . et al. ( eds) The Semantic Web ISWC 2012 . ISWC 2012. Lecture Notes in Computer Science , vol 7649 . Springer, Berlin, Heidelberg, doi.org/10.1007/978-3- 642 -35176-1 _ 32 , 2012 [24] H. Xia , T. Jiliang , G. Huiji , L. Huan, Unsupervised Sentiment Analysis with Emotional Signals , WWW 13 Proceedings of the 22nd international conference on World Wide Web , 607 - 618 , doi. 10.1145/2488388.2488442, 2013 . [25] Gurusamy V. , Kannan S. , Preprocessing Techniques for Text Mining , 2014