load_machines#

A module that provides machine list construction for distributed parallel environments, including queueing systems.

Currently supports UGE, LSF, PBS and SLURM by parsing the contents of the PE_HOSTFILE, LSB_MCPU_HOSTS, PBS_NODEFILE and SLURM_JOB_NODELIST variables, respectively.

Functions:

load_machines([machine_info, host_info, ncores])

Provide a function to construct a machine list from allocated machines.

ansys.fluent.core.scheduler.load_machines.load_machines(machine_info=None, host_info=None, ncores=None)#

Provide a function to construct a machine list from allocated machines.

Parameters:
machine_infolist[dict[str, int]], optional
List of machines provided by the caller. Must be of the form:
[{‘machine-name’<m-name-1>, ‘core-count’<int>},

{‘machine-name’ : <m-name-2>, ‘core-count’ : <int>}, … ]

host_infostr, optional

Host file name or list of machines and cores as a string separated by commas and colons as follows: Example 1: ‘M0:3,M1:2’ Example 2: ‘M0,M0,M0,M1,M1’

ncoresint, optional

Total core count. If provided without machine_info, sets the core count for local parallel. If both machine_info and ncores are provided, then the machine list determined by machine_info will be limited by the ncores value.

Returns:
MachineList

A list of machines.

Notes

On UGE the PE_HOSTFILE variable is used to find machines, LSB_MCPU_HOSTS list for LSF, PBS_NODEFILE for PBS and SLURM_JOB_NODELIST on SLURM. Unsupported job schedulers may provide alternative ways of providing a list of machines, in that case the list must be pre-parsed and provided via the machine_info or host_info parameters.

In some SLURM environments, the hostnames contained within the variable SLURM_JOB_NODELIST may not be valid to ssh to. In that case those names cannot be passed to the solver. So, in the SLURM branch there is a test to check if ssh to the first host is working, and if not, get ‘actual’ machine names using scontrol.