squeue -u $USER
or
squeue -u abc123
This displays job IDs, names, partitions, statuses (e.g., R = running, PD = pending), and nodes assigned.
2. Check Job Details
To see detailed information about a specific job:
scontrol show job
Example:
scontrol show job 123456
This displays job resources, node allocation, start time, and the reason for the pending status.
3. Check Job Output
When a job finishes, Slurm writes output and error logs to files specified in your job script:
#SBATCH --output=output.log
#SBATCH --error=error.log
You can view them using:
cat output.log
less error.log
If you do not specify #SBATCH --output
or #SBATCH --error
, Slurm will generate slurm-scancel
To cancel all your jobs:
scancel -u $USER
sacct -j --format=JobID,JobName,Elapsed,State,AllocCPUs,MaxRSS
This helps optimize future job submissions.
6. Requeue or Hold Jobs Requeue a failed job:
=scontrol requeue=
Hold or release a job:
=scontrol holdscontrol release =
sinfo
This shows available partitions, node states, and time limits.
For a complete guide to Slurm, please refer to the official documentation at https://slurm.schedmd.com/documentation.html.