load_machines#
A module that provides machine list construction for distributed parallel environments, including queueing systems.
Currently supports UGE, LSF, PBS and SLURM by parsing the contents of the PE_HOSTFILE, LSB_MCPU_HOSTS, PBS_NODEFILE and SLURM_JOB_NODELIST variables, respectively.
Functions:
|
Provide a function to construct a machine list from allocated machines. |
- ansys.fluent.core.scheduler.load_machines.load_machines(machine_info=None, host_info=None, ncores=None)#
Provide a function to construct a machine list from allocated machines.
- Parameters:
- machine_info
list
[dict
[str
,int
]],optional
- List of machines provided by the caller. Must be of the form:
- [{‘machine-name’<m-name-1>, ‘core-count’<int>},
{‘machine-name’ : <m-name-2>, ‘core-count’ : <int>}, … ]
- host_info
str
,optional
Host file name or list of machines and cores as a string separated by commas and colons as follows: Example 1: ‘M0:3,M1:2’ Example 2: ‘M0,M0,M0,M1,M1’
- ncores
int
,optional
Total core count. If provided without machine_info, sets the core count for local parallel. If both machine_info and ncores are provided, then the machine list determined by machine_info will be limited by the ncores value.
- machine_info
- Returns:
MachineList
A list of machines.
Notes
On UGE the PE_HOSTFILE variable is used to find machines, LSB_MCPU_HOSTS list for LSF, PBS_NODEFILE for PBS and SLURM_JOB_NODELIST on SLURM. Unsupported job schedulers may provide alternative ways of providing a list of machines, in that case the list must be pre-parsed and provided via the machine_info or host_info parameters.
In some SLURM environments, the hostnames contained within the variable SLURM_JOB_NODELIST may not be valid to ssh to. In that case those names cannot be passed to the solver. So, in the SLURM branch there is a test to check if ssh to the first host is working, and if not, get ‘actual’ machine names using scontrol.