RELEASE_NOTES 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236
  1. RELEASE NOTES FOR SLURM VERSION 16.05
  2. 27 April 2016
  3. IMPORTANT NOTES:
  4. ANY JOBS WITH A JOB ID ABOVE 2,147,463,647 WILL BE PURGED WHEN SLURM IS
  5. UPGRADED FROM AN OLDER VERSION! Reduce your configured MaxJobID value as needed
  6. prior to upgrading in order to eliminate these jobs.
  7. If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
  8. The 16.05 slurmdbd will work with Slurm daemons of version 14.11 and above.
  9. You will not need to update all clusters at the same time, but it is very
  10. important to update slurmdbd first and having it running before updating
  11. any other clusters making use of it. No real harm will come from updating
  12. your systems before the slurmdbd, but they will not talk to each other
  13. until you do. Also at least the first time running the slurmdbd you need to
  14. make sure your my.cnf file has innodb_buffer_pool_size equal to at least 64M.
  15. You can accomplish this by adding the line
  16. innodb_buffer_pool_size=64M
  17. under the [mysqld] reference in the my.cnf file and restarting the mysqld. The
  18. buffer pool size must be smaller than the size of the MySQL tmpdir. This is
  19. needed when converting large tables over to the new database schema.
  20. Slurm can be upgraded from version 14.03 or 14.11 to version 15.08 without loss
  21. of jobs or other state information. Upgrading directly from an earlier version
  22. of Slurm will result in loss of state information.
  23. If using SPANK plugins that use the Slurm APIs, they should be recompiled when
  24. upgrading Slurm to a new major release.
  25. HIGHLIGHTS
  26. ==========
  27. -- Implemented and documented PMIX protocol which is used to bootstrap an
  28. MPI job. PMIX is an alternative to PMI and PMI2.
  29. -- Change default CgroupMountpoint (in cgroup.conf) from "/cgroup" to
  30. "/sys/fs/cgroup" to match current standard.
  31. -- Add Multi-Category Security (MCS) infrastructure to permit nodes to be bound
  32. to specific users or groups.
  33. -- Added --deadline option to salloc, sbatch and srun. Jobs which can not be
  34. completed by the user specified deadline will be terminated with a state of
  35. "Deadline" or "DL".
  36. -- Add an "scontrol top <jobid>" command to re-order the priorities of a user's
  37. pending jobs. May be disabled with the "disable_user_top" option in the
  38. SchedulerParameters configuration parameter.
  39. -- Added new job dependency type of "aftercorr" which will start a task of a
  40. job array after the corresponding task of another job array completes.
  41. -- Add --gres-flags=enforce-binding option to salloc, sbatch and srun commands.
  42. If set, the only CPUs available to the job will be those bound to the
  43. selected GRES (i.e. the CPUs identifed in the gres.conf file will be
  44. strictly enforced rather than advisory).
  45. -- Added wrappers for LSF/OpenLava commands
  46. -- Added Grid Engine options to qsub command wrapper.
  47. RPMBUILD CHANGES
  48. ================
  49. -- Remove all *.la files from RPMs.
  50. -- Implemented the --without=package option for configure.
  51. CONFIGURATION FILE CHANGES (see man appropriate man page for details)
  52. =====================================================================
  53. -- New configuration parameter NodeFeaturesPlugins added.
  54. -- Change default CgroupMountpoint (in cgroup.conf) from "/cgroup" to
  55. "/sys/fs/cgroup" to match current standard.
  56. -- Introduce a new parameter "requeue_setup_env_fail" in SchedulerParameters.
  57. If set, a job that fails to setup the environment will be requeued and the
  58. node drained.
  59. -- The partition-specific SelectTypeParameters parameter can now be used to
  60. change the memory allocation tracking specification in the global
  61. SelectTypeParameters configuration parameter. Supported partition-specific
  62. values are CR_Core, CR_Core_Memory, CR_Socket and CR_Socket_Memory. If the
  63. global SelectTypeParameters value includes memory allocation management and
  64. the partition-specific value does not, then memory allocation management for
  65. that partition will NOT be supported (i.e. memory can be over-allocated).
  66. Likewise the global SelectTypeParameters might not include memory management
  67. while the partition-specific value does.
  68. -- Split partition's "Priority" field into "PriorityTier" (used to order
  69. partitions for scheduling and preemption) plus "PriorityJobFactor" (used by
  70. priority/multifactor plugin in calculating job priority, which is used to
  71. order jobs within a partition for scheduling). If only "Priority" is
  72. specified, that value will be used for both the "PriorityTier" and
  73. "PriorityJobFactor" value.
  74. -- New configuration file "knl.conf" added specifically for Intel Knights
  75. Landing processor support.
  76. -- Make it so jobs/steps track ':' named gres/tres, before hand gres/gpu:tesla
  77. would only track gres/gpu, now it will track both gres/gpu and
  78. gres/gpu:tesla as separate gres if configured like
  79. AccountingStorageTRES=gres/gpu,gres/gpu:tesla
  80. -- Add TCPTimeout option to slurm[dbd].conf. Decouples MessageTimeout from TCP
  81. connections.
  82. -- Added SchedulingParameters option of "bf_min_prio_reserve". Jobs below
  83. the specified threshold will not have resources reserved for them.
  84. -- Add SchedulerParameter "no_env_cache", if set no environment cache will be
  85. used when launching a job, instead the job will fail and drain the node if
  86. the environment isn't loaded normally.
  87. -- Remove the SchedulerParameters option of "assoc_limit_continue", making it
  88. the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop"
  89. is set and a job cannot start due to association limits, then do not attempt
  90. to initiate any lower priority jobs in that partition. Setting this can
  91. decrease system throughput and utlization, but avoid potentially starving
  92. larger jobs by preventing them from launching indefinitely.
  93. -- Rename partition configuration from "Shared" to "OverSubscribe". Rename
  94. salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old
  95. options will continue to function. Output field names also changed in
  96. scontrol, sinfo, squeue and sview.
  97. -- Add TopologyParam option of "TopoOptional" to optimize network topology
  98. only for jobs requesting it.
  99. -- Configuration parameter "CpuFreqDef" used to set default governor for job
  100. step not specifying --cpu-freq (previously the parameter was unused)
  101. -- Use TaskPluginParam for default task binding if no user specified CPU
  102. binding. User --cpu_bind option takes precident over default. No longer
  103. any error if user --cpu_bind option does not match TaskPluginParam.
  104. COMMAND CHANGES (see man pages for details)
  105. ===========================================
  106. -- sbatch to read OpenLava/LSF/#BSUB options from the batch script.
  107. -- Add sbatch "--wait" option that waits for job completion before exiting.
  108. Exit code will match that of spawned job.
  109. -- Job output and error files can now contain "%" character by specifying
  110. a file name with two consecutive "%" characters. For example,
  111. "sbatch -o "slurm.%%.%j" for job ID 123 will generate an output file named
  112. "slurm.%.123".
  113. -- Increase default sbcast buffer size from 512KB to 8MB.
  114. -- Add "ValidateTimeout" and "OtherTimeout" to "scontrol show burst" output.
  115. -- Implemented the checking configuration functionality using the new -C
  116. options of slurmctld. To check for configuration errors in slurm.conf
  117. run: 'slurmctld -C'.
  118. -- Burst buffer advanced reservation units treated as bytes (per documentation)
  119. rather than GB.
  120. -- Add "features_act" field (currently active features) to the node
  121. information. Output of scontrol, sinfo, and sview changed accordingly.
  122. The field previously displayed as "Features" is now "AvailableFeatures"
  123. while the new field is displayed as "ActiveFeatures".
  124. -- Enable sbcast data compression logic (compress option previously ignored).
  125. -- Add --compress option to srun command for use with --bcast option.
  126. -- Added "sacctmgr show lostjobs" to report any orphaned jobs in the database.
  127. -- Add reservation flag of "purge_comp" which will purge an advanced
  128. reservation once it has no more active (pending, suspended or running) jobs.
  129. -- Add ARRAY_TASKS mail option to send emails to each task in a job array.
  130. OTHER CHANGES
  131. =============
  132. -- Add mail wrapper script "smail" that will include job statistics in email
  133. notification messages.
  134. -- Removed support for authd. authd has not been developed and supported since
  135. several years.
  136. -- Enable the hdf5 profiling of the batch step.
  137. -- Eliminate redundant environment and script files for job arrays. This
  138. greatly reduces the number of files involved in managing job arrays.
  139. -- Burst buffer/cray - Add support for multiple buffer pools including support
  140. for different resource granularity by pool.
  141. -- Stop searching sbatch scripts for #PBS directives after 100 lines of
  142. non-comments. Stop parsing #PBS or #SLURM directives after 1024 characters
  143. into a line. Required for decent perforamnce with huge scripts.
  144. -- New node features plugin infrastructure added. Currently used for support
  145. of Intel Knights Landing processor.
  146. -- If NodeHealthCheckProgram configured HealthCheckInterval is non-zero, then
  147. modify slurmd to run it before registering with slurmctld.
  148. -- select/cray - Initiate step node health check at start of step termination
  149. rather than after application completely ends so that NHC can capture
  150. information about hung (non-killable) processes.
  151. -- Display thread name along with thread id and remove process name in stderr
  152. logging for "thread_id" LogTimeFormat.
  153. API CHANGES
  154. ===========
  155. Removed PARAMS macro from slurm.h.
  156. Removed BEGIN_C_DECLS and END_C_DECLS macros from slurm.h.
  157. Changed members of the following structs
  158. ========================================
  159. In burst_buffer_info_t, Changed gres_cnt to pool_cnt
  160. Changed gres_ptr to pool_ptr
  161. In burst_buffer_pool_t: Changed avail_cnt to total_space
  162. In job_desc_msg_t: Changed nice from 16 to 32-bits
  163. In partition_info_t: Split priority into priority_job_factor and priority_tier
  164. In slurm_job_info_t: Changed nice from 16 to 32-bits
  165. Added members to the following struct definitions
  166. =================================================
  167. In burst_buffer_info_t: Added other_timeout and validate_timeout
  168. In burst_buffer_resv_t: Added pool
  169. In job_desc_msg_t: Added deadline, mcs_label
  170. In node_info_t: Added features_act and mcs_label
  171. In resource_allocation_response_msg_t: Added ntasks_per_board, ntasks_per_core,
  172. ntasks_per_socket
  173. In slurm_job_info_t: Added deadline, num_tasks, mcs_label
  174. In slurm_ctl_conf_t: Added mcs_plugin, mcs_plugin_params, tcp_timeout, and
  175. node_features_plugins
  176. In slurm_step_launch_params_t: Added ntasks_per_board, ntasks_per_core,
  177. ntasks_per_socket
  178. In update_node_msg_t: Added features_act
  179. In slurm_job_info_t: Added start_protocol_ver
  180. In slurm_step_layout_t: Added start_protocol_ver
  181. In job_step_info_t: Added start_protocol_ver
  182. Added the following struct definitions
  183. ======================================
  184. Added top_job_msg_t for user to reorder his jobs
  185. Removed members from the following struct definitions
  186. =====================================================
  187. In burst_buffer_resv_t: Removed gres_cnt and gres_ptr
  188. Changed the following enums and #defines
  189. ========================================
  190. Added DEBUG_FLAG_TIME_CRAY and DEBUG_FLAG_NODE_FEATURES
  191. Added RESERVE_FLAG_PURGE_COMP
  192. Added JOB_DEADLINE (job state)
  193. Added MAIL_ARRAY_TASK (mail flag)
  194. Changed MAX_TASKS_PER_NODE from 128 to 512
  195. Changed NICE_OFFSET from 10000 to 0x80000000
  196. Added new job state/wait reasons: FAIL_DEADLINE, WAIT_QOS_MAX_BB_PER_ACCT,
  197. WAIT_QOS_MAX_CPU_PER_ACCT, WAIT_QOS_MAX_ENERGY_PER_ACCT,
  198. WAIT_QOS_MAX_GRES_PER_ACCT, WAIT_QOS_MAX_NODE_PER_ACCT,
  199. WAIT_QOS_MAX_LIC_PER_ACCT, WAIT_QOS_MAX_MEM_PER_ACCT,
  200. WAIT_QOS_MAX_UNK_PER_ACCT, WAIT_QOS_MAX_JOB_PER_ACCT,
  201. WAIT_QOS_MAX_SUB_JOB_PER_ACCT
  202. Added new partition limit enforcement flags: PARTITION_ENFORCE_NONE,
  203. PARTITION_ENFORCE_ALL, PARTITION_ENFORCE_ANY
  204. Added select plugin IDs: SELECT_PLUGIN_BLUEGENE, SELECT_PLUGIN_CONS_RES,
  205. SELECT_PLUGIN_LINEAR, SELECT_PLUGIN_ALPS, SELECT_PLUGIN_SERIAL,
  206. SELECT_PLUGIN_CRAY_LINEAR, SELECT_PLUGIN_CRAY_CONS_RES
  207. Added job flags: GRES_ENFORCE_BIND and TEST_NOW_ONLY
  208. Added job resource sharing flags: JOB_SHARED_NONE, JOB_SHARED_OK,
  209. JOB_SHARED_USER, JOB_SHARED_MCS
  210. Added the following API's
  211. =========================
  212. Added slurm_top_job() function to reorder a user's jobs
  213. Changed the following API's
  214. ============================
  215. Added use_protocol_ver parameter to slurm_job_step_stat().