Sentiment analysis from text feedback is used by an ever-increasing number of people, and organizations to automatically classify feedback in the form of comments or chats. Many companies will want to judge public sentiment regarding their products or services. Using this feedback they can better their products and services. Since it is impossible for a single person to do this, and impractical to use human resources for this task, a machine learning solution would be very useful.
Automatically knowing if a review/comment is positive or negative goes a long way when looking for feedback to improve a service/application. Our model is currently capable of taking a sentence and classifying its intent as either positive or negative.
Explanation of Steps:
- Pre-Processing - Preprocessing is a common stage in any task involving Twitter data because of the language irregularities that are present in tweets.
- Pre-trained word vectors - Learning word representations from massive unannotated text corpora have recently been used in many NLP tasks. Leveraging large corpora for unsupervised learning of word representations enables capturing of syntactic and semantic characteristics of words.
- DCNN model - CNNs with pooling operation deal naturally with variable length sentences and they also take into account the ordering of the words and the context each word appears in.
- Tokenization - Tokenization describes the general process of splitting the text of a document into a series of tokens in order to identify all words in a given document for further processing, especially to create a term-document matrix.
- Train Embedding Layer - A word embedding is a way of representing text where each word in the vocabulary is represented by a real-valued vector in a high-dimensional space. The vectors are learned in such a way that words that have similar meanings will have similar representation in the vector space (close in the vector space).
Technologies used
Discussion