BUILD.NOTES 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272
  1. This information is meant primarily for the Slurm developers.
  2. System administrators should read the instructions at
  3. http://slurm.schedmd.com/quickstart_admin.html
  4. (also found in the file doc/html/quickstart_admin.shtml).
  5. The "INSTALL" file contains generic Linux build instructions.
  6. Simple build/install on Linux:
  7. ./configure --enable-debug \
  8. --prefix=<install-dir> --sysconfdir=<config-dir>
  9. make
  10. make install
  11. To build the files in the contribs directory:
  12. make contrib
  13. make install-contrib
  14. (The RPMs are built by default)
  15. If you make changes to any auxdir/* or Makefile.am file, then run
  16. _snowflake_ (where there are recent versions of autoconf, automake
  17. and libtool installed):
  18. ./autogen.sh
  19. then check-in the new Makefile.am and Makefile.in files
  20. Here is a step-by-step HOWTO for creating a new release of Slurm on a
  21. Linux cluster (See BlueGene and AIX specific notes below for some differences).
  22. 0. Get current copies of Slurm and buildfarm
  23. > git clone https://<user_name>@github.com/chaos/slurm.git
  24. > svn co https://eris.llnl.gov/svn/chaos/private/buildfarm/trunk buildfarm
  25. place the buildfarm directory in your search path
  26. > export PATH=~/buildfarm:$PATH
  27. 1. Update NEWS and META files for the new release. In the META file,
  28. the API, Major, Minor, Micro, Version, and Release fields must all
  29. by up-to-date. **** DON'T UPDATE META UNTIL RIGHT BEFORE THE TAG ****
  30. The Release field should always be 1 unless one of
  31. the following is true
  32. - Changes were made to the spec file, documentation, or example
  33. files, but not to code.
  34. - this is a prerelease (Release = 0.preX)
  35. 2. Tag the repository with the appropriate name for the new version.
  36. Note the first three digits are the version number. For a proper release,
  37. the last digit is "1" (except for a rebuild without code changes which
  38. could be "2"). For pre-releases, the last digit should be "0" followed by
  39. "pre#" or "rc#".
  40. > git tag -a slurm-2-6-7-1 -m "create tag v2.6.7" OR
  41. > git tag -a slurm-2-7-0-0pre5 -m "create tag v2.7.0-pre5"
  42. > git push --tags
  43. 3. Use the rpm make target to create the new RPMs. This requires a .rpmmacros
  44. (.rpmrc for newer versions of rpmbuild) file containing:
  45. %_slurm_sysconfdir /etc/slurm
  46. %_with_debug 1
  47. %_with_sgijob 1
  48. NOTE: build will make a tar-ball based upon ALL of the files in your current
  49. local directory. If that includes scratch files, everyone will get those
  50. files in the tar-ball. For that reason, it is a good idea to clone a clean
  51. copy of the repository and build from that
  52. > git clone https://<user_name>@github.com/chaos/slurm.git <local_dir>
  53. Build using the following syntax:
  54. > build --snapshot -s <local_dir> OR
  55. > build --nosnapshot -s <local_dir>
  56. --nosnapshot will name the tar-ball and RPMs based upon the META file
  57. --snapshot will name the tar-ball and RPMs based upon the META file plus a
  58. timestamp. Do this to make a tar-ball for a non-tagged release.
  59. NOTE: <local_dir> should be a fully-qualified pathname
  60. 4. scp the files to schedmd.com in to ~/www/download/latest or
  61. ~/www/download/development. Move the older files to ~/www/download/archive,
  62. login to schedmd.com, cd to ~/download, and execute "php process.php" to
  63. update the web pages.
  64. BlueGene build notes:
  65. 0. If on a bgp system and you want sview export these variables
  66. > export CFLAGS="-I/opt/gnome/lib/gtk-2.0/include -I/opt/gnome/lib/glib-2.0/include $CFLAGS"
  67. > export LIBS="-L/usr/X11R6/lib64 $LIBS"
  68. > export CMD_LDFLAGS='-L/usr/X11R6/lib64'
  69. > export PKG_CONFIG_PATH="/opt/gnome/lib64/pkgconfig/:$PKG_CONFIG_PATH"
  70. 1. Use the rpm make target to create the new RPMs. This requires a .rpmmacros
  71. (.rpmrc for newer versions of rpmbuild) file containing:
  72. %_prefix /usr
  73. %_slurm_sysconfdir /etc/slurm
  74. %_with_bluegene 1
  75. %_without_pam 1
  76. %_with_debug 1
  77. Build on Service Node with using the following syntax
  78. > rpmbuild -ta slurm-...bz2
  79. The RPM files get written to the directory
  80. /usr/src/packages/RPMS/ppc64
  81. To build and run on AIX:
  82. 0. Get current copies of Slurm and buildfarm
  83. > git clone https://<user_name>@github.com/chaos/slurm.git
  84. > svn co https://eris.llnl.gov/svn/chaos/private/buildfarm/trunk buildfarm
  85. put the buildfarm directory in your search path
  86. > export PATH=~/buildfarm:$PATH
  87. Put the buildfarm directory in your search path
  88. Also, you will need several commands to appear FIRST in your PATH:
  89. /usr/local/tools/gnu/aix_5_64_fed/bin/install
  90. /usr/local/gnu/bin/tar
  91. /usr/bin/gcc
  92. I do this by making symlinks to those commands in the buildfarm directory,
  93. then making the buildfarm directory the first one in my PATH.
  94. Also, make certain that the "proctrack" rpm is installed.
  95. 1. Export some environment variables
  96. > export OBJECT_MODE=32
  97. > export PKG_CONFIG="/usr/bin/pkg-config"
  98. 2. Build with:
  99. > ./configure --enable-debug --prefix=/opt/freeware \
  100. --sysconfdir=/opt/freeware/etc/slurm \
  101. --with-ssl=/opt/freeware --with-munge=/opt/freeware \
  102. --with-proctrack=/opt/freeware
  103. make
  104. make uninstall # remove old shared libraries, aix caches them
  105. make install
  106. 3. To build RPMs (NOTE: GNU tools early in PATH as described above in #0):
  107. Create a .rpmmacros file specifying system specific files:
  108. #
  109. # RPM Macros for use with Slurm on AIX
  110. # The system-wide macros for RPM are in /usr/lib/rpm/macros
  111. # and this overrides a few of them
  112. #
  113. %_prefix /opt/freeware
  114. %_slurm_sysconfdir %{_prefix}/etc/slurm
  115. %_defaultdocdir %{_prefix}/doc
  116. %_with_debug 1
  117. %_with_aix 1
  118. %with_ssl "--with-ssl=/opt/freeware"
  119. %with_munge "--with-munge=/opt/freeware"
  120. %with_proctrack "--with-proctrack=/opt/freeware"
  121. Log in to the machine "uP". uP is currently the lowest-common-denominator
  122. AIX machine.
  123. NOTE: build will make a tar-ball based upon ALL of the files in your current
  124. local directory. If that includes scratch files, everyone will get those
  125. files in the tar-ball. For that reason, it is a good idea to clone a clean
  126. copy of the repository and build from that
  127. > git clone https://<user_name>@github.com/chaos/slurm.git <local_dir>
  128. Build using the following syntax:
  129. > export CC=/usr/bin/gcc
  130. > build --snapshot -s <local_dir> OR
  131. > build --nosnapshot -s <local_dir>
  132. --nosnapshot will name the tar-ball and RPMs based upon the META file
  133. --snapshot will name the tar-ball and RPMs based upon the META file plus a
  134. timestamp. Do this to make a tar-ball for a non-tagged release.
  135. 4. Test POE after telling POE where to find Slurm's LoadLeveler wrapper.
  136. > export MP_RMLIB=./slurm_ll_api.so
  137. > export CHECKPOINT=yes
  138. 5. > poe hostname -rmpool debug
  139. 6. To debug, set SLURM_LL_API_DEBUG=3 before running poe - will create a file
  140. /tmp/slurm.*
  141. It can also be helpful to use poe options "-ilevel 6 -pmdlog yes"
  142. There will be a log file create named /tmp/mplog.<jobid>.<taskid>
  143. 7. If you update proctrack, be sure to run "slibclean" to clear cached
  144. version.
  145. 8. Remove the RPMs that we don't want:
  146. rm -f slurm-perlapi*rpm slurm-torque*rpm
  147. and install the other RPMs into /usr/admin/inst.images/slurm/aix5.3 on an
  148. OCF AIX machine (pdev is a good choice).
  149. Debian build notes:
  150. Since Debian doesn't have PRMs, the rpmbuild program can not locate
  151. dependencies, so build without them by patching the build program:
  152. Index: build
  153. ===================================================================
  154. --- build (revision 173)
  155. +++ build (working copy)
  156. @@ -798,6 +798,7 @@
  157. $cmd .= " --define \"_tmppath $rpmdir/TMP\"";
  158. $cmd .= " --define \"_topdir $rpmdir\"";
  159. $cmd .= " --define \"build_bin_rpm 1\"";
  160. + $cmd .= " --nodeps";
  161. if (defined $conf{rpm_dist}) {
  162. my $dist = length $conf{rpm_dist} ? $conf{rpm_dist} : "%{nil}";
  163. $cmd .= " --define \"dist $dist\"";
  164. AIX/Federation switch window problems
  165. To clean switch windows: ntblclean =w 8 -a sni0
  166. To get switch window status: ntblstatus
  167. BlueGene bglblock boot problem diagnosis
  168. - Logon to the Service Node (bglsn, ubglsn)
  169. - Execute /admin/bglscripts/fatalras
  170. This will produce a list of failures including Rack and Midplane number
  171. <date> R<rack> M<midplane> <failure details>
  172. - Translate the Rack and Midplane to Slurm node id: smap -R r<rack><midplane>
  173. - Drain only the bad Slurm node, return others to service using scontrol
  174. Configuration file update procedures:
  175. - cd /usr/bgl/dist/slurm (on bgli)
  176. - co -l <filename>
  177. - vi <filename>
  178. - ci -u <filename>
  179. - make install
  180. - then run "dist_local slurm" on SN and FENs to update /etc/slurm
  181. Some RPM commands:
  182. rpm -qa | grep slurm (determine what is installed)
  183. rpm -qpl slurm-1.1.9-1.rpm (check contents of an rpm)
  184. rpm -e slurm-1.1.8-1 (erase an rpm)
  185. rpm --upgrade slurm-1.1.9-1.rpm (replace existing rpm with new version)
  186. rpm -i --ignoresize slurm-1.1.9-1.rpm (install a new rpm)
  187. For main Slurm plugin installation on BGL service node:
  188. rpm -i --force --nodeps --ignoresize slurm-1.1.9-1.rpm
  189. rpm -U --force --nodeps --ignoresize slurm-1.1.9-1.rpm (upgrade option)
  190. To clear a wedged job:
  191. /bgl/startMMCSconsole
  192. > delete bgljob ####
  193. > free RMP###
  194. Starting and stopping daemons on Linux:
  195. /etc/init.d/slurm stop
  196. /etc/init.d/slurm start
  197. Patches:
  198. - cd to the top level src directory
  199. - Run the patch command with epilog_complete.patch as stdin:
  200. patch -p[path_level_to_filter] [--dry-run] < epilog_complete.patch
  201. To get the process and job IDs with proctrack/sgi_job:
  202. - jstat -p
  203. CVS and gnats:
  204. Include "gnats:<id> e.g. "(gnats:123)" as part of cvs commit to
  205. automatically record that update in gnats database. NOTE: Does
  206. not change gnats bug state, but records source files associated
  207. with the bug.
  208. For memory leaks (for AIX use zerofault, zf; for linux use valgrind)
  209. - Run configure with the option "--enable-memory-leak-debug" to completely
  210. release allocated memory when the daemons exit
  211. - valgrind --tool=memcheck --leak-check=yes --num-callers=8 --leak-resolution=med \
  212. ./slurmctld -Dc >valg.ctld.out 2>&1
  213. - valgrind --tool=memcheck --leak-check=yes --num-callers=8 --leak-resolution=med \
  214. ./slurmd -Dc >valg.slurmd.out 2>&1 (Probably only one one node of cluster)
  215. - valgrind --tool=memcheck --leak-check=yes --num-callers=8 --leak-resolution=med \
  216. ./slurmdbd -D >valg.dbd.out 2>&1
  217. - Run the regression test. In the globals.local file include:
  218. "set enable_memory_leak_debug 1"
  219. - Shutdown the daemons using "scontrol shutdown"
  220. - Examine the end of the log files for leaks. pthread_create() and dlopen()
  221. have small memory leaks on some systems, which do not grow over time
  222. - Functions in the plugins will not have their symbols preserved when the
  223. plugin is unloaded and the function names will appear as "???" in the
  224. valgrind backtrace after shutdown. Rebuilding the daemons without the
  225. configure option of "--enable-memory-leak-debug" typically prevents the
  226. plugin from being unloaded so the symbols will be properly reported. However
  227. many memory leaks will be reported due to not unloading plugins. You will
  228. need to match the call sequence from the first log (with
  229. "--enable-memory-leak-debug") to the second log (without
  230. "--enable-memory-leak-debug" and ignore memory leaks reported that are not
  231. real leaks in the second log) to identify the full code path through plugins.
  232. Job profiling:
  233. - "export CFLAGS=-pg", then run "configure" and "make install" as usual.
  234. - Run the slurm daemons through a stress test and exit normally
  235. - Run "gprof [executable-file] >outfile"
  236. Before new major release:
  237. - Test on ia64, i386, x86_64, BGQ, NRT/POE, Cray
  238. - Test on Elan and IB switches
  239. - Test fail-over of slurmctld
  240. - Test for memory leaks in slurmctld, slurmd and slurmdbd with various plugins
  241. - Change API version number
  242. - Run "make check" (requires "dejagnu" package)
  243. - Test that the prolog and epilog run
  244. - Run the test suite with SlurmUser NOT being self
  245. - Test for errors reported by CLANG tool:
  246. NOTE: Run "configure" with "--enable-developer" option so assert functions
  247. take effect.
  248. scan-build -k -v make >m.sb.out 2>&1
  249. # and look for output in /tmp/scan-build-<DATE>