Monitoring your jobs
Using the squeue Command
Check the status of all jobs on Shamu using the squeue command:
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
10435 gpu-v100 a5-5-8 kdg242 PD 0:00 1 (Resources)
9596 bigmem sys/dash xce775 R 10-18:06:10 1 compute009
9960 plasmon sys/dash xce775 R 7-00:54:19 1 compute025
10094 defq sys/dash fym313 R 2-11:03:59 1 compute028
10149 defq bash iqr224 R 5-04:31:02 1 compute039
10385 softmatte c300-a23 kdg242 R 1-02:05:09 1 compute107
10386 softmatte c300-a25 kdg242 R 1-02:05:09 1 compute106
10387 softmatte c300-a26 kdg242 R 1-02:05:08 1 compute108
10388 softmatte c300-a27 kdg242 R 1-02:05:07 1 compute109
10389 softmatte c300-a28 kdg242 R 1-02:05:06 1 compute110
10390 softmatte c300-a29 kdg242 R 1-02:05:06 1 compute102
10391 softmatte c300-a30 kdg242 R 1-02:05:06 1 compute105
10392 softmatte c300-a32 kdg242 R 1-02:05:06 1 compute111
10393 softmatte c300-a35 kdg242 R 1-02:05:06 1 compute104
10420 gpu test yqs327 R 19:44:48 1 gpu02
10421 gpu test yqs327 R 19:42:38 1 gpu02
10422 gpu test yqs327 R 19:42:01 1 gpu02
10423 gpu test yqs327 R 19:41:15 1 gpu02
10424 gpu test yqs327 R 19:38:48 1 gpu02
10428 gpu test yqs327 R 19:23:45 1 gpu01
10433 gpu-v100 a2-5 kdg242 R 3:34:23 1 gpu03
10434 gpu-v100 a805 kdg242 R 3:34:16 1 gpu04
10436 softmatte c250-a20 kdg242 R 3:29:48 1 compute092
10437 softmatte c250-a23 kdg242 R 3:29:20 1 compute093
10438 softmatte c250-a25 kdg242 R 3:29:09 1 compute094
10439 softmatte c250-a26 kdg242 R 3:28:54 1 compute095
10440 softmatte c250-a27 kdg242 R 3:28:40 1 compute096
10441 softmatte c250-a28 kdg242 R 3:28:28 1 compute097
10442 softmatte c250-a30 kdg242 R 3:28:04 1 compute098
10443 softmatte c250-a32 kdg242 R 3:27:54 1 compute099
10444 softmatte c250-a35 kdg242 R 3:27:46 1 compute100
10445 defq bash gqd693 R 58:02 1 compute029
Using the sinfo Command
Check the status of the job partitions using the sinfo command:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
defq* up infinite 3 mix compute[028-029,039]
defq* up infinite 51 idle compute[001-004,006-008,012-024,030-038,040-057,088-091]
bigmem up infinite 1 mix compute009
gpu up infinite 2 mix gpu[01-02]
plasmon up infinite 1 mix compute025
ids up infinite 2 idle compute[010-011]
millwater up infinite 1 idle compute005
softmatter up infinite 18 alloc compute[092-100,102,104-111]
softmatter up infinite 2 idle compute[101,103]
gpu-v100 up infinite 2 mix gpu[03-04]
Using the sacct Command
Check the status of individual jobs:
$ sacct -j 10445
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
10445 bash defq admins 1 RUNNING 0:0
--
AdminUser - 16 Jun 2017