MAGICS-LAB
diff --git a/‎README.md
+63-33 b/‎README.md
+63-33
diff --git a/‎imgs.zip
1.31 MB b/‎imgs.zip
1.31 MB
diff --git a/‎layers.py
+4-2 b/‎layers.py
+4-2
diff --git a/‎mnist_mil_main.py
+15-13 b/‎mnist_mil_main.py
+15-13
diff --git a/‎mnist_mil_trainer.py
+2-3 b/‎mnist_mil_trainer.py
+2-3
diff --git a/‎models.py
+49-1 b/‎models.py
+49-1
diff --git a/‎run.sh
+7 b/‎run.sh
+7
diff --git a/‎utils/__pycache__/entmax.cpython-38.pyc
0 Bytes b/‎utils/__pycache__/entmax.cpython-38.pyc
0 Bytes
diff --git a/‎utils/__pycache__/general_entmax.cpython-38.pyc
5.18 KB b/‎utils/__pycache__/general_entmax.cpython-38.pyc
5.18 KB
@@ -1,19 +1,5 @@
 # On Sparse Modern Hopfield Model
-This is the code of the paper [On Sparse Modern Hopfield Model](https://arxiv.org/pdf/2309.12673.pdf). You can use this repo to reproduce the results in the paper.
-
-## Citations
-Please consider citing our paper in your publications if it helps. Here is the bibtex:
-
-```
-@inproceedings{
-  hu2023on,
-  title={On Sparse Modern Hopfield Model},
-  author={Jerry Yao-Chieh Hu and Donglin Yang and Dennis Wu and Chenwei Xu and Bo-Yu Chen and Han Liu},
-  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
-  year={2023},
-  url={https://openreview.net/forum?id=eCgWNU2Imw}
-}
-```
+This is the code of the paper [On Sparse Modern Hopfield Model](https://arxiv.org/pdf/2309.12673.pdf). You can use this repo to reproduce the results of our method.
 
 ## Environmental Setup
 
@@ -25,12 +11,61 @@ $ conda activate sparse_hopfield
 $ pip3 install -r requirements.txt
 ```
 
+## Examples
+
+In ```layers.py```, we have implemented the general sparse Hopfield, dense Hopfield and sparse Hopfield.
+To use it, see below
+
+```python
+dense_hp = HopfieldPooling(
+    d_model=d_model,
+    n_heads=n_heads,
+    mix=True,
+    update_steps=update_steps,
+    dropout=dropout,
+    mode="softmax",
+    scale=scale,
+    num_pattern=num_pattern) # Dense Hopfield
+
+sparse_hp = HopfieldPooling(
+    d_model=d_model,
+    n_heads=n_heads,
+    mix=True,
+    update_steps=update_steps,
+    dropout=dropout,
+    mode="sparsemax",
+    scale=scale,
+    num_pattern=num_pattern) # Sparse Hopfield
+
+entmax_hp = HopfieldPooling(
+    d_model=d_model,
+    n_heads=n_heads,
+    mix=True,
+    update_steps=update_steps,
+    dropout=dropout,
+    mode="entmax",
+    scale=scale,
+    num_pattern=num_pattern) # Hopfield with Entmax-15
+
+gsh_hp = HopfieldPooling(
+    d_model=d_model,
+    n_heads=n_heads,
+    mix=True,
+    update_steps=update_steps,
+    dropout=dropout,
+    mode="gsh",
+    scale=scale,
+    num_pattern=num_pattern) # Generalized Sparse Hopfield with learnable alpha
+```
+
+
 ## Experimental Validation of Theoretical Results
 
 ### Plotting
 
 ```shell
-$ python3 Plotting.py
+$ cd theoretical_results_validation
+$ python3 plotting.py
 ```
 
 ## Multiple Instance Learning(MIL) Tasks
@@ -128,21 +163,16 @@ Argument options
 * `gpus_per_trial`: how many gpus do u want to use for a single run (set this up carefully for hyperparameter tuning) (no larger than 1)
 * `gpus_id`: specify which gpus u want to use (e.g. `--gpus_id=0, 1` means cuda:0 and cuda:1 are used for this script)
 
+## Citations
+Please consider citing our paper in your publications if it helps. Here is the bibtex:
 
-## Acknowledgment
-
-The authors would like to thank the anonymous reviewers and program chairs for constructive comments.
-
-JH is partially supported by the Walter P. Murphy Fellowship.
-HL is partially supported by NIH R01LM1372201, NSF CAREER1841569, DOE DE-AC02-07CH11359, DOE LAB 20-2261 and a NSF TRIPODS1740735.
-This research was supported in part through the computational resources and staff contributions provided for the Quest high performance computing facility at Northwestern University which is jointly supported by the Office of the Provost, the Office for Research, and Northwestern University Information Technology.
-The content is solely the responsibility of the authors and does not necessarily represent the official
-views of the funding agencies.
-
-The experiments in this work benefit from the following open-source codes:
-* Ramsauer, Hubert, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Thomas Adler, Lukas Gruber et al. "Hopfield networks is all you need." arXiv preprint arXiv:2008.02217 (2020). https://github.com/ml-jku/hopfield-layers
-* Martins, Andre, and Ramon Astudillo. "From softmax to sparsemax: A sparse model of attention and multi-label classification." In International conference on machine learning, pp. 1614-1623. PMLR, 2016. https://github.com/KrisKorrel/sparsemax-pytorch
-* Correia, Gonçalo M., Vlad Niculae, and André FT Martins. "Adaptively sparse transformers." arXiv preprint arXiv:1909.00015 (2019). https://github.com/deep-spin/entmax & https://github.com/prajjwal1/adaptive_transformer
-* Ilse, Maximilian, Jakub Tomczak, and Max Welling. "Attention-based deep multiple instance learning." In International conference on machine learning, pp. 2127-2136. PMLR, 2018. https://github.com/AMLab-Amsterdam/AttentionDeepMIL
-* Zhang, Yunhao, and Junchi Yan. "Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting." In The Eleventh International Conference on Learning Representations. 2022. https://github.com/Thinklab-SJTU/Crossformer
-* Millidge, Beren, Tommaso Salvatori, Yuhang Song, Thomas Lukasiewicz, and Rafal Bogacz. "Universal hopfield networks: A general framework for single-shot associative memory models." In International Conference on Machine Learning, pp. 15561-15583. PMLR, 2022. https://github.com/BerenMillidge/Theory_Associative_Memory
+```
+@misc{hu2023sparse,
+      title={On Sparse Modern Hopfield Model}, 
+      author={Jerry Yao-Chieh Hu and Donglin Yang and Dennis Wu and Chenwei Xu and Bo-Yu Chen and Han Liu},
+      year={2023},
+      eprint={2309.12673},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG}
+}
+```
@@ -7,7 +7,7 @@
 from math import sqrt
 from utils.sparse_max import Sparsemax
 from utils.entmax import Entmax15
-
+from utils.general_entmax import EntmaxAlpha
 
 class FullAttention(nn.Module):
     '''
@@ -101,6 +101,8 @@ def __init__(self, scale=None, attention_dropout=0.0, mode='sparsemax', norm=Fal
             self.softmax = Sparsemax(dim=-1)
         elif mode == 'entmax':
             self.softmax = Entmax15(dim=-1)
+        elif mode == 'gsh':
+            self.softmax = EntmaxAlpha(dim=-1)
         else:
             self.softmax = nn.Softmax(dim=-1)
 
@@ -274,7 +276,7 @@ def __init__(
 
         self.ln = nn.LayerNorm(d_model, elementwise_affine=False)
 
-        if mode in ["sparsemax", "softmax", "entmax"]:
+        if mode in ["sparsemax", "softmax", "entmax", "gsh"]:
             self.inner_attention = HopfieldCore(
                 scale=scale, attention_dropout=dropout, mode=mode, norm=True)
 
 
@@ -15,7 +15,7 @@ def get_args():
 
     # Model params
     parser.add_argument('--mode', default="softmax", choices=["softmax", "entmax", "sparsemax"])
-    parser.add_argument('--d_model', default=256, type=int)
+    parser.add_argument('--d_model', default=512, type=int)
     parser.add_argument('--input_size', default=784, type=int)
     parser.add_argument('--model', default="pooling", type=str)
     parser.add_argument('--num_pattern', default=2, type=int)
@@ -25,7 +25,7 @@ def get_args():
     parser.add_argument('--dropout', default=0.3, type=float)
 
     # Training params
-    parser.add_argument('--lr', default=1e-3, type=float)
+    parser.add_argument('--lr', default=1e-4, type=float)
     parser.add_argument('--epoch', default=100, type=int)
     parser.add_argument('--seed', default=1111, type=int)
 
@@ -37,7 +37,6 @@ def get_args():
     parser.add_argument('--bag_size', default=10, type=int)
     parser.add_argument('--tgt_num', default=9, type=int)
 
-
     args = parser.parse_args()
 
     return vars(args)
@@ -47,19 +46,22 @@ def get_args():
 
     torch.set_num_threads(3)
     config = get_args()
-    trails = 1
+    trails = 5
     torch.manual_seed(config["seed"])
 
+
+
     if config["bag_size"] == 100:
         config["num_pattern"] = 4
     bag_size = config["bag_size"]
     # bag_size = [5, 10, 20, 50, 100, 200, 300]
-    models = ["softmax", "sparsemax", "entmax"]    
+    models = ["softmax", "sparsemax", "entmax", "gsh"]    
     data_log = None
 
     for m in models:
         config["mode"] = m
         for t in range(trails):
+            torch.random.manual_seed(torch.random.seed())
             trainer = Trainer(config, t)
             trail_log = trainer.train()
             if data_log is None:
@@ -68,22 +70,22 @@ def get_args():
                 for k,v in data_log.items():
                     data_log[k] = data_log[k] + trail_log[k]
 
-    sns.lineplot(data=data_log, x="epoch", y="train loss", hue="model")
+    sns.lineplot(data=data_log, x="epoch", y="train loss", hue="model", alpha=0.4, errorbar=None, linewidth=2)
     plt.tight_layout()
-    plt.savefig(f'./imgs/train_loss_{bag_size}.png')
+    plt.savefig(f'./imgs/train_loss_{bag_size}.pdf')
     plt.clf()
 
-    sns.lineplot(data=data_log, x="epoch", y="test loss", hue="model")
+    sns.lineplot(data=data_log, x="epoch", y="test loss", hue="model", alpha=0.4, errorbar=None, linewidth=2)
     plt.tight_layout()
-    plt.savefig(f'./imgs/test_loss_{bag_size}.png')
+    plt.savefig(f'./imgs/test_loss_{bag_size}.pdf')
     plt.clf()
 
-    sns.lineplot(data=data_log, x="epoch", y="train acc", hue="model")
+    sns.lineplot(data=data_log, x="epoch", y="train acc", hue="model", alpha=0.4, errorbar=None, linewidth=2)
     plt.tight_layout()
-    plt.savefig(f'./imgs/train_acc_{bag_size}.png')
+    plt.savefig(f'./imgs/train_acc_{bag_size}.pdf')
     plt.clf()
 
-    sns.lineplot(data=data_log, x="epoch", y="test acc", hue="model")
+    sns.lineplot(data=data_log, x="epoch", y="test acc", hue="model", alpha=0.4, errorbar=None, linewidth=2)
     plt.tight_layout()
-    plt.savefig(f'./imgs/test_acc_{bag_size}.png')
+    plt.savefig(f'./imgs/test_acc_{bag_size}.pdf')
     plt.clf()
@@ -43,8 +43,7 @@ def _get_data(self):
 
     def _get_model(self):
 
-        model = MNISTModel(input_size=self.config["input_size"],
-                            d_model=self.config["d_model"],
+        model = CIFARModel(d_model=self.config["d_model"],
                             n_heads=self.config["n_heads"], 
                             update_steps=self.config["update_steps"], 
                             dropout=self.config["dropout"],
@@ -55,7 +54,7 @@ def _get_model(self):
         return model.cuda()
 
     def _get_opt(self):
-        return torch.optim.AdamW(self.model.parameters(), lr=self.config["lr"])
+        return torch.optim.AdamW(self.model.parameters(), lr=self.config["lr"], weight_decay=0.001)
 
     def _get_cri(self):
         return torch.nn.BCEWithLogitsLoss()
 
@@ -21,7 +21,7 @@ def __init__(
         self.ln = nn.LayerNorm(d_model)
         self.ln2 = nn.LayerNorm(d_model)
 
-        if mode in ["sparsemax", 'softmax', 'entmax']:
+        if mode in ["sparsemax", 'softmax', 'entmax', 'gsh']:
             self.layer = HopfieldPooling(
                 d_model=d_model,
                 n_heads=n_heads,
@@ -42,4 +42,52 @@ def forward(self, x):
         x = self.ln(self.emb(x))
         out = self.ln2(self.gelu(self.layer(x)))
         out = out.view(bz, -1)
+        return self.fc(out).squeeze(-1)
+    
+
+
+
+class CIFARModel(nn.Module):
+    def __init__(
+            self,
+            d_model=256,
+            n_heads=4,
+            update_steps=1,
+            dropout=0.1,
+            mode='softmax',
+            scale=None,
+            num_pattern=1):
+        super(CIFARModel, self).__init__()
+
+        assert d_model % n_heads == 0
+        self.emb = nn.Conv2d(3, 6, 5)
+        self.pool = nn.MaxPool2d(4, 2)
+        self.linear = nn.Linear(1014, d_model)
+
+        self.ln = nn.LayerNorm(d_model)
+        self.ln2 = nn.LayerNorm(d_model)
+
+        if mode in ["sparsemax", 'softmax', 'entmax', 'gsh']:
+            self.layer = HopfieldPooling(
+                d_model=d_model,
+                n_heads=n_heads,
+                mix=True,
+                update_steps=update_steps,
+                dropout=dropout,
+                mode=mode,
+                scale=scale,
+                num_pattern=num_pattern)
+
+        self.fc = nn.Linear(d_model*num_pattern, 1)
+        self.gelu = nn.GELU()
+
+    def forward(self, x):
+
+        bz, N, c, h, w = x.size()
+        x = x.view(bz*N, c, h, w)
+        x = self.pool(self.emb(x))
+        x = x.view(bz, N, -1)
+        x = self.ln(self.linear(x))
+        out = self.ln2(self.gelu(self.layer(x)))
+        out = out.view(bz, -1)
         return self.fc(out).squeeze(-1)
@@ -0,0 +1,7 @@
+python3 mnist_mil_main.py --bag_size 20
+python3 mnist_mil_main.py --bag_size 50
+python3 mnist_mil_main.py --bag_size 5
+python3 mnist_mil_main.py --bag_size 10
+python3 mnist_mil_main.py --bag_size 100
+python3 mnist_mil_main.py --bag_size 30
+python3 mnist_mil_main.py --bag_size 80