-
Notifications
You must be signed in to change notification settings - Fork 123
Issues: aws-samples/awsome-distributed-training
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Incorrect version of devscripts installed in Ubuntu 22.04 AMI, causing pip package install breaks
bug
Something isn't working
#677
opened May 15, 2025 by
amanshanbhag
SMHP Slurm deployment failure (LCS related) after 22.04 AMI update
bug
Something isn't working
#675
opened May 15, 2025 by
gmgtamz
Race Condition in Lifecycle script Something isn't working
fsx_ubuntu.sh
when trying to set home dir to /fsx/ubuntu
bug
#674
opened May 13, 2025 by
amanshanbhag
Add N/2 NCCL tests for K8s
enhancement
New feature or request
#664
opened May 1, 2025 by
amanshanbhag
Rename CPU-DDP Kubernetes manifest from fsdp.yaml to ddp.yaml for clarity
#649
opened Apr 22, 2025 by
kjrstory
Change docker to rootless docker
enhancement
New feature or request
#646
opened Apr 18, 2025 by
mhuguesaws
Change slurm exporter to prometheus slurm exporter
enhancement
New feature or request
#644
opened Apr 16, 2025 by
mhuguesaws
add command examples for picotron SmolLM test case
bug
Something isn't working
#625
opened Mar 31, 2025 by
KeitaW
Conda environment creation script uses proprietary Anaconda channels
#582
opened Mar 11, 2025 by
jrandall
Change Amazon FSx for Lustre from Auto IOPS to user provisionned.
stale
#572
opened Feb 28, 2025 by
mhuguesaws
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.