Skip to content

Commit 2885841

Browse files
committed
word-embedding models API cleaned up
1 parent 9d7dead commit 2885841

File tree

4 files changed

+26
-12
lines changed

4 files changed

+26
-12
lines changed

docs/codes.rst

-6
Original file line numberDiff line numberDiff line change
@@ -97,12 +97,6 @@ Module `shorttext.utils.gensim_corpora`
9797
.. automodule:: shorttext.utils.gensim_corpora
9898
:members:
9999

100-
Module `shorttext.utils.wordembed`
101-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
102-
103-
.. automodule:: shorttext.utils.wordembed
104-
:members:
105-
106100
Module `shorttext.utils.compactmodel_io`
107101
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
108102

docs/tutorial_wordembed.rst

+23-5
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,12 @@ their page. To load the model, call:
1010
>>> import shorttext
1111
>>> wvmodel = shorttext.utils.load_word2vec_model('/path/to/GoogleNews-vectors-negative300.bin.gz')
1212

13-
It is a binary file, and the default is set to be `binary=True`. In fact, it is equivalent to calling,
14-
if you have `gensim` version before 1.0.0:
13+
It is a binary file, and the default is set to be `binary=True`.
1514

16-
>>> import gensim
17-
>>> wvmodel = gensim.models.Word2Vec.load_word2vec_format('/path/to/GoogleNews-vectors-negative300.bin.gz', binary=True)
15+
.. automodule:: shorttext.utils.wordembed
16+
:members: load_word2vec_model
1817

19-
Or beyond version 1.0.0,
18+
It is equivalent to calling,
2019

2120
>>> import gensim
2221
>>> wvmodel = gensim.models.KeyedVectors.load_word2vec_format('/path/to/GoogleNews-vectors-negative300.bin.gz', binary=True)
@@ -87,6 +86,9 @@ To load a pre-trained FastText model, run:
8786

8887
And it is used exactly the same way as Word2Vec.
8988

89+
.. automodule:: shorttext.utils.wordembed
90+
:members: load_fasttext_model
91+
9092
Poincaré Embeddings
9193
-------------------
9294

@@ -98,6 +100,8 @@ pre-trained model, run:
98100

99101
For preloaded word-embedding models, please refer to :doc:`tutorial_wordembed`.
100102

103+
.. automodule:: shorttext.utils.wordembed
104+
:members: load_poincare_model
101105

102106
BERT
103107
----
@@ -120,6 +124,20 @@ The default BERT models and tokenizers are `bert-base_uncase`.
120124
If you want to use others, refer to `HuggingFace's model list
121125
<https://huggingface.co/models>`_ .
122126

127+
.. autoclass:: shorttext.utils.transformers.BERTObject
128+
:members:
129+
130+
.. autoclass:: shorttext.utils.transformers.WrappedBERTEncoder
131+
:members:
132+
133+
134+
Other Functions
135+
---------------
136+
137+
.. automodule:: shorttext.utils.wordembed
138+
:members: shorttext_to_avgvec
139+
140+
123141
Links
124142
-----
125143

docs/tutorial_wordembedAPI.rst

+2
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ using `RESTfulKeyedVectors`:
3232

3333
This model can be used like other `gensim` `KeyedVectors`.
3434

35+
.. autoclass:: shorttext.utils.wordembed.RESTfulKeyedVectors
36+
:members:
3537

3638

3739
Home: :doc:`index`

shorttext/utils/wordembed.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
from gensim.models.poincare import PoincareModel, PoincareKeyedVectors
77
import requests
88

9-
from shorttext.utils import tokenize, deprecated
9+
from shorttext.utils import tokenize
1010

1111

1212
def load_word2vec_model(path, binary=True):

0 commit comments

Comments
 (0)