Help:Performance monitoring

From CECS wiki

Jump to navigation Jump to search

This is a list of performance monitoring tools to help you optimize your jobs and understand their bottlenecks.

Locally written tools[edit]

If you want to use any of these tools and they are not available, please ask for instructions on how to get them!

nvidia-ps: view of short term gpu utilization, alternate of nvidia-smi which gives instant view
ganglia: web view of system and cluster system performance meters
heatmap: web view of gpu performance and job statistics and slurm status
squeue-gpu: command line gpu performance and job statistics and slurm status
gpust: stacked gpu graph for a cluster
cgp: web front end for viewing collectd statistics
slimits: view system wide QOS limits and usage (beta)
scontrol show assoc_mgr: gory details of slurm goo

slurm command line tools[edit]

squeue: view pending and current jobs
scontrol: view and modify all parameters for current and pending jobs
aacct: view collected statistics for past jobs
sinfo: view current status of cluster nodes

screen based system performance[edit]

top
atop
htop
nmon
iftop (root only)

command line based system performance[edit]

iostat
netstat
ps

Process based performance ad debugging[edit]

perf

Retrieved from "http://newton.i2lab.ucf.edu/w/index.php?title=Help:Performance_monitoring&oldid=15594"

Navigation menu