MATLAB
This page describes how to configure MATLAB to submit jobs to the Elysium HPC cluster, retrieve results, and debug errors.
Initial Configuration
Running MATLAB on the HPC Cluster
This setup is intended for job submission when you are logged directly into the cluster via the command line. This process only needs to be completed once per cluster.
After logging into the cluster, start MATLAB and run:
configClusterJobs will run across multiple nodes on the cluster rather than on the host machine.
Running MATLAB on the Desktop
This setup is intended for job submission when MATLAB is installed on your machine and jobs are run remotely on the cluster. This setup needs to be done once per cluster, per version of MATLAB installed on your machine.
Start MATLAB and run:
userpathDownload the Integration Scripts (RUB.Desktop.zip) directly from this page. Extract the ZIP contents into the folder returned by userpath.
Create a new cluster profile:
configClusterSubmission to the cluster requires SSH credentials. You will be prompted for your cluster username (LoginID).
Jobs will now run on the cluster rather than on the local machine. Before submitting jobs, log in to login002 via SSH on the command line so the two-factor authentication login is cached.
Note To run jobs on the local machine instead of the cluster, use the
Processesprofile.
% Get a handle to the local resources
c = parcluster('Processes');Configuring Jobs
Prior to submitting a job, you can assign scheduler flags such as queue, email, wall time, and more. The following properties are mandatory and must be set before you can submit a job:
AccountNameNodesPartitionWallTime
% Get a handle to the cluster
c = parcluster;
% REQUIRED
% Specify an account
c.AdditionalProperties.AccountName = 'account-name';
% Specify number of nodes
c.AdditionalProperties.Nodes = 1;
% Specify the partition
c.AdditionalProperties.Partition = 'partition-name';
% Specify the wall time (e.g. 1 day, 5 hours, 30 minutes)
c.AdditionalProperties.WallTime = '1-05:30';
% OPTIONAL
% Specify a constraint
c.AdditionalProperties.Constraint = 'feature-name';
% Request email notification of job status
c.AdditionalProperties.EmailAddress = 'firstname.familyname@ruhr-uni-bochum.de';
% Specify number of GPUs (default: 0)
c.AdditionalProperties.GPUsPerNode = 1;
% Specify the number of CPUs per GPU
c.AdditionalProperties.CPUsPerGPU = 1;
% Specify memory to use, per core (default: 4GB)
c.AdditionalProperties.MemPerCPU = '6GB';
% Specify cores per node (default: 0)
c.AdditionalProperties.ProcsPerNode = 4;
% Set node exclusivity (default: false)
% Note that this will automatically be set to true if using more
% than one node.
c.AdditionalProperties.RequireExclusiveNode = true;
% Specify a reservation
c.AdditionalProperties.Reservation = 'reservation-name';To persist changes made to AdditionalProperties between MATLAB sessions, save the profile:
c.saveProfileTo see the values of the current configuration options, display AdditionalProperties:
c.AdditionalPropertiesUnset a value when it is no longer needed:
% Turn off email notifications
c.AdditionalProperties.EmailAddress = '';
% Do not request an entire node
c.AdditionalProperties.RequireExclusiveNode = false;Note The instructions above cover the basics of configuring and running jobs on the cluster. For a more in-depth walkthrough of the job submission workflow, see the demo script ScalingToTheClusterDemoRemote.mlx.
Interactive Jobs - Running MATLAB on the HPC Cluster
To run an interactive pool job on the cluster, continue to use parpool as before:
% Get a handle to the cluster
c = parcluster;
% Open a pool of 64 workers on the cluster
pool = c.parpool(64);Rather than running a local pool on the host machine, the pool can now run across multiple nodes on the cluster.
% Run a parfor over 1000 iterations
parfor idx = 1:1000
a(idx) = rand;
endDelete the pool when it is no longer needed:
% Delete the pool
pool.deleteIndependent Batch Job - MATLAB on the HPC Cluster or Desktop
Use the batch command to submit asynchronous jobs to the cluster. The batch command returns a job object, which is used to access the output of the submitted job. See the MATLAB documentation for batch for more details.
% Get a handle to the cluster
c = parcluster;
% Submit job to query where MATLAB is running on the cluster
job = c.batch(@pwd, 1, {}, 'CurrentFolder', '.', 'AutoAddClientPath', false);
% Query job for state
job.State
% If job is finished, fetch the results
job.fetchOutputs{1}
% Delete the job after results are no longer needed
job.deleteTo retrieve a list of running or completed jobs, call parcluster to return the cluster object. The cluster object stores an array of jobs that are listed as queued, running, finished, or failed.
c = parcluster;
jobs = c.Jobs
% Get a handle to the second job in the list
job2 = c.Jobs(2);Once the job has been selected, fetch the results as previously shown.
fetchOutputs is used to retrieve function output arguments. If you call batch with a script, use load instead. Data written to disk on the cluster must be retrieved directly from the file system, for example via SFTP.
% Fetch all results from the second job in the list
job2.fetchOutputs{:}
% Alternate: load results if the job was a script instead of a function
job2.loadParallel Batch Job - MATLAB on the HPC Cluster or Desktop
The batch command also supports parallel workflows. Save the following example as parallel_example.m.
function [sim_t, A] = parallel_example(iter)
if nargin == 0
iter = 8;
end
disp('Start sim')
A = nan(iter, 1);
t0 = tic;
parfor idx = 1:iter
A(idx) = idx;
pause(2)
idx
end
sim_t = toc(t0);
disp('Sim completed')
save RESULTS A
endWhen using the batch command, specify a Pool argument:
% Get a handle to the cluster
c = parcluster;
% Submit a batch pool job using 4 workers for 16 simulations
job = c.batch(@parallel_example, 1, {16}, 'CurrentFolder', '.', 'Pool', 4, 'AutoAddClientPath', false);
% View current job status
job.State
% Fetch the results after a finished state is retrieved
job.fetchOutputs{1}Example output:
ans =
8.1678The job ran in 8.17 seconds using four workers. Note that these jobs always request N + 1 CPU cores, since one worker is required to manage the batch job and pool of workers. For example, a job that needs eight workers will require nine CPU cores.
Run the same simulation again but increase the pool size. This time, to retrieve the results later, keep track of the job ID.
Note For some applications, there will be diminishing returns when allocating too many workers, as the overhead may exceed computation time.
% Get a handle to the cluster
c = parcluster;
% Submit a batch pool job using 8 workers for 16 simulations
job = c.batch(@parallel_example, 1, {16}, 'CurrentFolder', '.', 'Pool', 8, 'AutoAddClientPath', false);
% Get the job ID
id = job.IDExample output:
id =
4% Clear job from workspace (as though MATLAB exited)
clear jobWith a handle to the cluster, the findJob method searches for the job with the specified job ID:
% Get a handle to the cluster
c = parcluster;
% Find the old job
job = c.findJob('ID', 4);
% Retrieve the state of the job
job.StateExample output:
ans =
finished% Fetch the results
job.fetchOutputs{1};Example output:
ans =
4.1503The job now runs in 4.15 seconds using eight workers. Run code with different numbers of workers to determine the ideal number to use. Alternatively, to retrieve job results via a graphical user interface, use the Job Monitor (Parallel > Monitor Jobs). It will take some time until the list is shown.
Debugging
If a serial job produces an error, call the getDebugLog method to view the error log file.
When submitting an independent job, specify the task:
c.getDebugLog(job.Tasks)For pool jobs, specify only the job object:
c.getDebugLog(job)When troubleshooting a job, the cluster administrators may request the scheduler ID of the job. You can derive it by calling getTaskSchedulerIDs:
job.getTaskSchedulerIDs()Example output:
ans =
4911680Helper Functions
| Function | Description | Notes |
|---|---|---|
clusterFeatures |
Lists cluster features / constraints | |
clusterGpuCards |
Lists cluster GPU cards | |
clusterPartitionNames |
Lists cluster partition / queue names | |
disableArchiving |
Modifies file archiving to resolve file mirroring issues | Applicable only to Desktop |
fixConnection |
Reestablishes cluster connection (for example after reconnecting VPN) | Applicable only to Desktop |
seff |
Displays Slurm statistics related to the efficiency of resource usage by the job | |
willRun |
Explains why a job is queued |
To Learn More
To learn more about the MATLAB Parallel Computing Toolbox, see: