You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -21,12 +22,19 @@ The repository contains a modular Python implementation of transformer architect
21
22
- The seminal paper _Attention Is All You Need_ by Vaswani et al.<sup><ahref="#references">[1]</a></sup> that details the novel attention-based transformer architecture and its application to sequence-to-sequence tasks, demonstrating its effectiveness by achieving state-of-the-art performance in machine translation, surpassing previous LSTM and CNN based neural machine translation architectures.
22
23
- The chapter on _Transformers and Large Language Models_ from _Speech and Language Processing_ by Jurafsky & Martin<sup><ahref="#references">[2]</a></sup> which provides a more comprehensive and illustrative look into some of the high-level details discussed in _Attention Is All You Need_.
23
24
24
-
## Restrictions
25
+
## Features
25
26
26
-
This project is implemented using [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/).
27
+
- Generic encoder-only, decoder-only and encoder-decoder transformer architectures.
28
+
- Wrappers for causal language modelling, sequence-to-sequence generation and classification/regression tasks.
29
+
- Various decoding methods for causal/sequence-to-sequence generation:
30
+
- Search-based (greedy and beam search)
31
+
- Sampling-based (nucleus, temperature and top-k sampling)
32
+
- Example applications to real-world datasets.
27
33
28
34
### PyTorch restrictions
29
35
36
+
This project is implemented using [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/).
37
+
30
38
As PyTorch provides a number of transformer and attention related layers in its [`torch.nn`](https://pytorch.org/docs/stable/nn.html) submodule, this project explicitly avoids the use of:
@@ -47,6 +55,47 @@ All other layers provided by `torch.nn` are allowed, including:
47
55
- No existing _"x from scratch"_ resources were used, such as the famous _Let's build GPT: from scratch, in code, spelled out._ by Andrej Karpathy<sup><ahref="#references">[3]</a></sup>.
48
56
- No other online resources were used, apart from official documentation for packages such as [PyTorch](https://pytorch.org/docs/stable/index.html), [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/) and [Huggingface Tokenizers](https://huggingface.co/docs/transformers/en/main_classes/tokenizer).
49
57
58
+
## Example
59
+
60
+
Training a causal language model to generate "Florida man"-style news headlines.
61
+
62
+
```python
63
+
from transformers import LlamaTokenizer
64
+
65
+
from transformer.params import TransformerParams, TemperatureSamplingParams
66
+
from transformer.models import CausalLM
67
+
from transformer.decoding import TemperatureSamplingDecoder
'Florida man arrested after baby alligator, guns, drugs found inside truck'
93
+
94
+
# generation with context
95
+
decoder.generate("Florida man shot")
96
+
'Florida man shot and killed while attempting to steal pizza and Pokemon cards from Target'
97
+
```
98
+
50
99
## Details
51
100
52
101
While the original architecture described in _Attention Is All You Need_ is an encoder-decoder based architecture using transformers for neural machine translation which is a sequence-to-sequence learning task, this project was designed to be more general, allowing for a variety of natural language tasks by implementing encoder-only, decoder-only and encoder-decoder architectures.
@@ -104,7 +153,7 @@ The following datasets were used to test the above transformer implementations o
104
153
-[Reddit r/FloridaMan](https://www.kaggle.com/datasets/bcruise/reddit-rfloridaman): News headlines about various (often funny and irrational) actions performed by Florida men and women.
105
154
-[Europarl](https://www.kaggle.com/datasets/nltkdata/europarl): Transcriptions of European Parliament proceedings between 1996-2006, collected in 11 languages.
106
155
107
-
## Models and examples
156
+
## Models and notebooks
108
157
109
158
### Encoder-only models
110
159
@@ -129,14 +178,15 @@ The following datasets were used to test the above transformer implementations o
129
178
-[**`notebooks/`**](notebooks/): Notebooks applying the models in [`transformer.models`](transformer/models/) to various datasets.
130
179
-[**`transformer/`**](transformer/): Core package containing the transformer implementations.
131
180
-[**`dataloaders/`**](transformer/dataloaders/): [`LightningDataModule`](https://lightning.ai/docs/pytorch/stable/data/datamodule.html)s for each model in [`transformer.models`](transformer/models/).
181
+
-[**`decoding/`**](transformers/decoding/): Decoding method implementations for causal and sequence-to-sequence LMs.
132
182
-[**`models/`**](transformer/models/): Task-specific transformers implemented using [`transformer.modules.transformers`](transformer/modules/transformers/).
133
183
-[**`modules/`**](transformer/modules/): [`LightningModule`](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html)s used within the transformers in [`transformer.models`](transformer/models/).
134
184
-[**`transformers/`**](transformer/modules/transformers/): Encoder-only, decoder-only and encoder-decoder transformer definitions.
0 commit comments