User Tools

Site Tools


hpc:slurm-setup

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
hpc:slurm-setup [2020/01/27 16:22]
miriel@uclv
hpc:slurm-setup [2020/04/10 17:38] (current)
Line 121: Line 121:
 In every node we need install a few dependencies:​ In every node we need install a few dependencies:​
 <​code>​ <​code>​
-yum install openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel man2html libibmad libibumad perl-ExtUtils-MakeMaker -y+yum install openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel man2html libibmad libibumad perl-ExtUtils-MakeMaker ​gcc -y
 </​code>​ </​code>​
  
Line 201: Line 201:
  
  
 +=== Create Slurm rpm packages ===
  
 +At the server, download the last version of Slurm. At this moment the last version es 19.05.5
 +
 +<​code>​
 +cd /tmp
 +wget https://​download.schedmd.com/​slurm/​slurm-19.05.5.tar.bz2
 +yum install rpm-build
 +rpmbuild -ta slurm-19.05.5.tar.bz2
 +</​code>​
 +
 +Copying the Slurm rpm files for installation from the master to the other servers o to a shared folder.
 +
 +<​code>​
 +cd ~/​rpmbuild/​RPMS/​x86_64
 +cp slurm*.rpm /​fns/​shared_folder
 +</​code>​
 +
 +== Install Slurm in the Master, compute, and Login nodes: ==
 +
 +The slurm-torque package could perhaps be omitted, but it does contain a useful /​usr/​bin/​mpiexec wrapper script.
 +
 +Before install Slurm, we need disable selinux
 +<​code>​
 +nano /​etc/​selinux/​config
 +</​code>​
 +
 +change SELINUX=enforcing to SELINUX=disables
 +
 +<​code>​
 +cd ~/​rpmbuild/​RPMS/​x86_64
 +export VER=19.05.5-1
 +yum install slurm-$VER*rpm slurm-devel-$VER*rpm slurm-perlapi-$VER*rpm slurm-torque-$VER*rpm slurm-example-configs-$VER*rpm
 +</​code>​
 +
 +Explicitly enable the service in the master
 +<​code>​
 +systemctl enable slurmctld
 +</​code>​
 +
 +Only if the database service will run on the Master node: Install the database service RPM:
 +
 +<​code>​
 +cd ~/​rpmbuild/​RPMS/​x86_64
 +export VER=19.05.5-1
 +yum install slurm-slurmdbd-$VER*rpm
 +</​code>​
 +
 +If you have a server for database, install in this server:
 +<​code>​
 +export VER=19.05.5-1
 +yum install slurm-$VER*rpm slurm-devel-$VER*rpm slurm-slurmdbd-$VER*rpm
 +
 +</​code>​
 +
 +Explicitly enable the service:
 +
 +<​code>​
 +systemctl enable slurmdbd
 +</​code>​
 +
 +We need to make sure that the server has all the right configurations and files.
 +
 +<​code>​
 +mkdir /​var/​spool/​slurmctld
 +chown slurm: /​var/​spool/​slurmctld
 +chmod 755 /​var/​spool/​slurmctld
 +touch /​var/​log/​slurmctld.log
 +chown slurm: /​var/​log/​slurmctld.log
 +touch /​var/​log/​slurm_jobacct.log /​var/​log/​slurm_jobcomp.log
 +chown slurm: /​var/​log/​slurm_jobacct.log /​var/​log/​slurm_jobcomp.log
 +
 +</​code>​
 +
 +== Compute nodes ==
 +On Compute nodes you may additionally install the slurm-slurmd and slurm-pam_slurm RPM package to prevent rogue users from logging in:
 +
 +<​code>​
 +export VER=19.05.5-1
 +yum install slurm-slurmd slurm-pam-$VER*rpm_slurm-$VER*rpm
 +systemctl enable slurmd
 +</​code>​
 +
 +We need to make sure that all the compute nodes have the right configurations and files.
 +
 +<​code>​
 +mkdir /​var/​spool/​slurmd
 +chown slurm: /​var/​spool/​slurmd
 +chmod 755 /​var/​spool/​slurmd
 +touch /​var/​log/​slurmd.log
 +chown slurm: /​var/​log/​slurmd.log
 +</​code>​
 +
 +==== Slurm configuration ====
 +
 +Slurm provides an example file located at /​etc/​slurm/​slurm.conf.example. You can copy this file to /​etc/​slurm/​slurm.conf
 +
 +<​code>​
 +cp /​etc/​slurm/​slurm.conf.example /​etc/​slurm/​slurm.conf
 +</​code>​
 +
 +It also have a web-based [[https://​slurm.schedmd.com/​configurator.html|configuration tool]] which can be used to build a simple configuration file, which can then be manually edited for more complex configurations.
 +
 +After that we need to edit /​etc/​slurm/​slurm.conf and make some modifications. Its 
 +
 +<​code>​
 +vi /​etc/​slurm/​slurm.conf
 +</​code>​
 +It is important to change the parameters: ClusterName and ControlMachine.
 +<​code>​
 +ClusterName=vlir-test
 +ControlMachine=10.10.2.242
 +SlurmUser=slurm
 +SlurmctldPort=6817
 +SlurmdPort=6818
 +AuthType=auth/​munge
 +StateSaveLocation=/​var/​spool/​slurm/​ctld
 +SlurmdSpoolDir=/​var/​spool/​slurm/​d
 +SwitchType=switch/​none
 +MpiDefault=none
 +SlurmctldPidFile=/​var/​run/​slurmctld.pid
 +SlurmdPidFile=/​var/​run/​slurmd.pid
 +ProctrackType=proctrack/​pgid
 +ReturnToService=0
 +
 +</​code>​
 +
 +If the /var/spool directory does not exist, you need to create it.
 +
 +<​code>​
 +mkdir /​var/​spool/​slurm
 +chown slurm.slurm -R /​var/​spool/​slurm
 +</​code>​
 +
 +==== Slurm logging ====
 +
 +
 +The Slurm logfile directory is undefined in the RPMs since you have to define it in slurm.conf. See SlurmdLogFile and SlurmctldLogFile in the slurm.conf page, and LogFile in the slurmdbd.conf page.
 +
 +Check your logging configuration with:
 +<​code>​
 +grep -i logfile /​etc/​slurm/​slurm.conf
 +</​code>​
 +<​code>​
 +SlurmctldLogFile=/​var/​log/​slurm/​slurmctld.log
 +SlurmdLogFile=/​var/​log/​slurm/​slurmd.log
 +</​code>​
 +
 +<​code>​
 +scontrol show config | grep -i logfile
 +</​code>​
 +<​code>​
 +SlurmctldLogFile ​       = /​var/​log/​slurm/​slurmctld.log
 +SlurmdLogFile ​          = /​var/​log/​slurm/​slurmd.log
 +SlurmSchedLogFile ​      = /​var/​log/​slurm/​slurmsched.log
 +
 +</​code>​
 +
 +If log files are configured, you have to create the log file directory manually:
 +
 +<​code>​
 +mkdir /​var/​log/​slurm
 +chown slurm.slurm /​var/​log/​slurm
 +</​code>​
 +
 +
 +Study the configuration information in the [[https://​slurm.schedmd.com/​quickstart_admin.html|Quick Start Administrator_Guide]].
 +
 +===== Home Users =====
 +For the users folder, you can use the server'​s local disk or mount the remote storage. For this reason it is recommended to create a folder to put the information of the users. In this example we created a folder /​home/​CLUSTER and here we create the folder for every users.
 +
 +<​code>​
 +mkdir /​home/​CLUSTER
 +</​code>​
 +
 +===== Creating users =====
 +
 +For the users you can crate every user manually o you can user an external user database how Active Directory, OpenLDAP or MySQL, etc.
 +For this example we going to create the users manually in every server.
  
  
hpc/slurm-setup.1580142138.txt.gz ยท Last modified: 2020/04/10 17:38 (external edit)