Code for the paper "Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos"(CoLLAs 2025).
Run pip install -r requirements.txt
to install requirements.
You can access the SAYCam dataset through Databrary after going through an authorization process. As of May 2025, the site is currently under maintainance but it should be back online soon. We are not allowed to redistribute the dataset or its derivatives here according to its license.
You can download the KrishnaCam dataset from its offical source here. Then, use the script metas/convert_video_frame.py
to decode the videos into frames at 10 fps.
You can access the Labeled-S dataset through the same Databrary repo as SAYCam above. Our training and test splits for Labeled-S can be found at metas/labeledS_train_val.txt
, where each line includes the file name and its corresponding label. The first half of the file is the training set and the second half is the test set.
You can access the ImageNet dataset (ILSVRC 2017) through Kaggle. Our training and test splits for mini-ImageNet can be found at metas/miniinet_train_val.txt
, with similar syntax as the Labeled-S dataset above.
You can access the iNaturalist 2018 Dataset from their official github repo.
Example command for training:
python training/train_storyboard_classmerging.py \
--save_dir <path_to_save_checkpoints> \
--curr_batch_size 64 \
--replay_batch_size 448 \
--long_buffer_size 45000 \
--short_buffer_size 5000 \
--curr_loss_coef 1.0 \
--tc_loss_coef 1.0 \
--group_norm \
--depth 50 \
--dataset saycam \
--subsample 8 \
--class_length 7500 \
--lr 0.05 \
--warmup 500 \
--method simsiam \
--merge_threshold 0.003
You can use the --imagenet_eval
and --labeledS_eval
flags to enable periodic SVM and kNN classification evaluation on the mini-ImageNet and Labeled-S datasets.
You can run linear probing evaluation on the iNaturalist 2018 dataset with
python linear_decoding_inat.py --data <path_to_inat_data> --num_classes 8142 --epochs 20 --batch_size 1024 --fc_bn --image_size 112 --load_dir <path_to_model>
You can run linear probing evaluation on the ImageNet dataset with
python linear_decoding.py --data <path_to_imagenet> --num_classes 1000 --epochs 10 --batch_size 1024 --fc_bn --image_size 112 --load_dir <path_to_model>
We thank the authors of the papers "How Well Do Unsupervised Learning Algorithms Model Human Real-time and Life-long Learning?", "The Challenges of Continuous Self-Supervised Learning", and "Integrating Present and Past in Unsupervised Continual Learning" for releasing their code.