Skip to content

Commit 56e7e14

Browse files
Refactoring of all classes and packages, tokenizer (Apache Lucene removed) to be Android ready, and the single language concept towards multi-language. Some fixes. Some Java 1.7 compatible object-oriented and performance improvements. Tests 100% OK!
1 parent a453763 commit 56e7e14

20 files changed

+1083
-489
lines changed

README.md

Lines changed: 40 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,23 @@ VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon
44
and rule-based sentiment analysis tool that is _specifically attuned
55
to sentiments expressed in social media_.
66

7-
This is a fork with **API and package names breaking changes** of the
7+
This is an implementation of the VADER in Java. It started as a fork of the
88
[Java port by Animesh Pandey](https://github.com/apanimesh061/VaderSentimentJava)
9-
of the
10-
[NLTK VADER sentiment analysis module](http://www.nltk.org/api/nltk.sentiment.html#module-nltk.sentiment.vader)
11-
written in Python and optimized from the original.
12-
13-
- The [NLTK](http://www.nltk.org/_modules/nltk/sentiment/vader.html)
14-
Python source code.
15-
- The [Original](https://github.com/cjhutto/vaderSentiment) Python
16-
source code by the paper's author C.J. Hutto.
9+
of the [NLTK VADER sentiment analysis module](http://www.nltk.org/api/nltk.sentiment.html#module-nltk.sentiment.vader)
10+
written in Python ([NLTK VADER source code](http://www.nltk.org/_modules/nltk/sentiment/vader.html))
11+
from the [original project](https://github.com/cjhutto/vaderSentiment) by
12+
the paper's author C.J. Hutto. It's the same algorithm as an improved
13+
tool by extensive rewriting with **relevant changes**:
14+
15+
- Android ready.
16+
- API and package names breaking changes.
17+
- Java 1.7 compatible.
18+
- Performance improvements (e.g., `LinkedList` where's better O() than
19+
`ArrayList`).
20+
21+
**In progress**
22+
23+
- Multi-language (refer to section [Languages](#languages)).
1724

1825
## Repository
1926

@@ -51,22 +58,39 @@ https://github.com/nunoachenriques/vader-sentiment-analysis/releases
5158

5259
## Testing
5360

54-
The tests from the original Java port are validated against the ground truth of
55-
the original Python (NLTK) implementation. The algorithm running is still the
56-
original implementation from Hutto & Gilbert in Python and ported to Java by
57-
Animesh Pandey.
61+
All tests are **100% OK** as expected!
5862

5963
```shell
6064
./gradlew test
6165
```
6266

67+
## Languages
68+
69+
To support several languages there's the `Language` interface
70+
(`text` subpackage) to be implemented and, eventually, the `Tokenizer` too.
71+
The **main effort** will be in all the research around the specific language
72+
significant words, idiomatic expressions, constant and empirical values.
73+
Moreover, a data set has to be produced and validated by humans as
74+
_ground truth_ for testing purposes.
75+
76+
### English (Germanic family of languages)
77+
78+
The tests from the original Java port are validated against the _ground truth_
79+
of the original Python (NLTK) implementation. The algorithm running is still the
80+
original implementation from Hutto & Gilbert in Python and originally ported to
81+
Java by Animesh Pandey with modifications by Nuno A. C. Henriques.
82+
83+
### Portuguese (Italic family of languages)
84+
85+
**TODO**
86+
6387
## Use case example
6488

65-
As a Java library it will easily integrates with a bit of coding.
89+
As a Java library it will easily integrate with a bit of coding.
6690

6791
```java
6892
...
69-
ArrayList<String> sentences = new ArrayList<String>() {{
93+
List<String> sentences = new LinkedList<>() {{
7094
add("VADER is smart, handsome, and funny.");
7195
add("VADER is smart, handsome, and funny!");
7296
add("VADER is very smart, handsome, and funny.");
@@ -86,7 +110,7 @@ ArrayList<String> sentences = new ArrayList<String>() {{
86110
add("Today kinda sux! But I'll get by, lol");
87111
}};
88112

89-
SentimentAnalysis sa = new SentimentAnalysis();
113+
SentimentAnalysis sa = new SentimentAnalysis(new TokenizerEnglish(), new English());
90114

91115
for (String sentence : sentences) {
92116
System.out.println(sentence);

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.1
1+
2.0.0

build.gradle

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,14 @@ repositories {
1515
dependencies {
1616
implementation fileTree(include: ['*.jar'], dir: 'src/main/dist')
1717
testImplementation 'junit:junit:4.12'
18+
testImplementation 'org.apache.lucene:lucene-core:5.5.4'
19+
testImplementation 'org.apache.lucene:lucene-analyzers-common:5.5.4'
20+
}
21+
22+
test {
23+
// For comparision only, different results on Tokenizer vs. Lucene
24+
// but final ground truth results are the same (expected).
25+
exclude 'net/nunoachenriques/vader/text/Tokenizer*'
1826
}
1927

2028
// EXTRA PACKAGING FOR RELEASE DISTRIBUTION

src/main/dist/commons-lang-2.6.jar

-278 KB
Binary file not shown.
-1.5 MB
Binary file not shown.

src/main/dist/lucene-core-5.5.4.jar

-2.26 MB
Binary file not shown.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
/*
2+
* Copyright 2017 Nuno A. C. Henriques [nunoachenriques.net]
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package net.nunoachenriques.vader;
17+
18+
/**
19+
* The VADER's constant values (e.g., NORMALIZE_SCORE_ALPHA_DEFAULT)
20+
* for configurations and other uses among the algorithm.
21+
*
22+
* @author Nuno A. C. Henriques [nunoachenriques.net]
23+
*/
24+
class Constant {
25+
26+
// Private constructor to avoid instantiation.
27+
private Constant() {
28+
// Void!
29+
}
30+
31+
// TODO check SentimentAnalysis for missing constants!
32+
33+
static final float NORMALIZE_SCORE_ALPHA_DEFAULT = 15.0f;
34+
static final float ALL_CAPS_BOOSTER_SCORE = 0.733f;
35+
static final float N_SCALAR = -0.74f;
36+
static final float EXCLAMATION_BOOST = 0.292f;
37+
static final float QUESTION_BOOST_COUNT_3 = 0.18f;
38+
static final float QUESTION_BOOST = 0.96f;
39+
}

0 commit comments

Comments
 (0)