@@ -4,16 +4,23 @@ VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon
4
4
and rule-based sentiment analysis tool that is _ specifically attuned
5
5
to sentiments expressed in social media_ .
6
6
7
- This is a fork with ** API and package names breaking changes ** of the
7
+ This is an implementation of the VADER in Java. It started as a fork of the
8
8
[ Java port by Animesh Pandey] ( https://github.com/apanimesh061/VaderSentimentJava )
9
- of the
10
- [ NLTK VADER sentiment analysis module] ( http://www.nltk.org/api/nltk.sentiment.html#module-nltk.sentiment.vader )
11
- written in Python and optimized from the original.
12
-
13
- - The [ NLTK] ( http://www.nltk.org/_modules/nltk/sentiment/vader.html )
14
- Python source code.
15
- - The [ Original] ( https://github.com/cjhutto/vaderSentiment ) Python
16
- source code by the paper's author C.J. Hutto.
9
+ of the [ NLTK VADER sentiment analysis module] ( http://www.nltk.org/api/nltk.sentiment.html#module-nltk.sentiment.vader )
10
+ written in Python ([ NLTK VADER source code] ( http://www.nltk.org/_modules/nltk/sentiment/vader.html ) )
11
+ from the [ original project] ( https://github.com/cjhutto/vaderSentiment ) by
12
+ the paper's author C.J. Hutto. It's the same algorithm as an improved
13
+ tool by extensive rewriting with ** relevant changes** :
14
+
15
+ - Android ready.
16
+ - API and package names breaking changes.
17
+ - Java 1.7 compatible.
18
+ - Performance improvements (e.g., ` LinkedList ` where's better O() than
19
+ ` ArrayList ` ).
20
+
21
+ ** In progress**
22
+
23
+ - Multi-language (refer to section [ Languages] ( #languages ) ).
17
24
18
25
## Repository
19
26
@@ -51,22 +58,39 @@ https://github.com/nunoachenriques/vader-sentiment-analysis/releases
51
58
52
59
## Testing
53
60
54
- The tests from the original Java port are validated against the ground truth of
55
- the original Python (NLTK) implementation. The algorithm running is still the
56
- original implementation from Hutto & Gilbert in Python and ported to Java by
57
- Animesh Pandey.
61
+ All tests are ** 100% OK** as expected!
58
62
59
63
``` shell
60
64
./gradlew test
61
65
```
62
66
67
+ ## Languages
68
+
69
+ To support several languages there's the ` Language ` interface
70
+ (` text ` subpackage) to be implemented and, eventually, the ` Tokenizer ` too.
71
+ The ** main effort** will be in all the research around the specific language
72
+ significant words, idiomatic expressions, constant and empirical values.
73
+ Moreover, a data set has to be produced and validated by humans as
74
+ _ ground truth_ for testing purposes.
75
+
76
+ ### English (Germanic family of languages)
77
+
78
+ The tests from the original Java port are validated against the _ ground truth_
79
+ of the original Python (NLTK) implementation. The algorithm running is still the
80
+ original implementation from Hutto & Gilbert in Python and originally ported to
81
+ Java by Animesh Pandey with modifications by Nuno A. C. Henriques.
82
+
83
+ ### Portuguese (Italic family of languages)
84
+
85
+ ** TODO**
86
+
63
87
## Use case example
64
88
65
- As a Java library it will easily integrates with a bit of coding.
89
+ As a Java library it will easily integrate with a bit of coding.
66
90
67
91
``` java
68
92
...
69
- ArrayList <String > sentences = new ArrayList< String > () {{
93
+ List <String > sentences = new LinkedList< > () {{
70
94
add(" VADER is smart, handsome, and funny." );
71
95
add(" VADER is smart, handsome, and funny!" );
72
96
add(" VADER is very smart, handsome, and funny." );
@@ -86,7 +110,7 @@ ArrayList<String> sentences = new ArrayList<String>() {{
86
110
add(" Today kinda sux! But I'll get by, lol" );
87
111
}};
88
112
89
- SentimentAnalysis sa = new SentimentAnalysis ();
113
+ SentimentAnalysis sa = new SentimentAnalysis (new TokenizerEnglish (), new English () );
90
114
91
115
for (String sentence : sentences) {
92
116
System . out. println(sentence);
0 commit comments