Help:Performance monitoring
Jump to navigation
Jump to search
This is a list of performance monitoring tools to help you optimize your jobs and understand their bottlenecks.
Locally written tools[edit]
If you want to use any of these tools and they are not available, please ask for instructions on how to get them!
- nvidia-ps
- view of short term gpu utilization, alternate of nvidia-smi which gives instant view
- ganglia
- web view of system and cluster system performance meters
- heatmap
- web view of gpu performance and job statistics and slurm status
- squeue-gpu
- command line gpu performance and job statistics and slurm status
- gpust
- stacked gpu graph for a cluster
- cgp
- web front end for viewing collectd statistics
- slimits
- view system wide QOS limits and usage (beta)
- scontrol show assoc_mgr
- gory details of slurm goo
slurm command line tools[edit]
- squeue
- view pending and current jobs
- scontrol
- view and modify all parameters for current and pending jobs
- aacct
- view collected statistics for past jobs
- sinfo
- view current status of cluster nodes
screen based system performance[edit]
- top
- atop
- htop
- nmon
- iftop (root only)
command line based system performance[edit]
- iostat
- netstat
- ps
Process based performance ad debugging[edit]
- perf