-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Glacier
More file actions
106 lines (78 loc) · 3.59 KB
/
README.Glacier
File metadata and controls
106 lines (78 loc) · 3.59 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
Submitting jobs to PBS on glacier.westgrid.ca
---------------------------------------------
Glacier is an IBM eServer BladeCenter HS20 with 60 chassis, 14 blades
per chassis, for a total of 840 nodes. Each node consists of dual
3.0GHz processors with 2 to 4GB RAM, running RedHat 9.0.
Machines are identified as 'iceM_N', where M is the chassis number
from 1 to 60 and N is the blade number, from 1 to 14.
Within each chassis, blades are connected by a Foundry FastIron 1500 +
Foundry FastIron 800 connected via 8 GigE trunk. Processors in
different chassis have 4 GigE uplinks.
Since the connection between processors in seperate chassis are slow
compared to the connections between processors in a single chassis,
one might as well abort jobs which don't end up concentrated in a
one chassis.
One can request specfic blades using a line like:
#PBS -l nodes=ice10_1:ppn=2+ice10_2:ppn=2
- Environment variables are not automatically created on MPI hosts, so
it is necessary to set variables like OMP_NUM_THREADS in ~/.login or
~/.bashrc instead.
- mpirun must be told which nodes to use. Use the -machinefile switch
and pass it the name of the node file, which is made available by
PBS in the environment variable $PBS_NODEFILE. For example:
mpirun -np 2 -machinefile $PBS_NODEFILE executable [arguments]
- Resources are specified using #PBS -l xxx entries. In particular,
the nodes=N:ppn=M entry must be chosen carefully. Generally the best
performace is obtained when one MPI process per node is used, and
OpenMP is used on each node to utilize the available processors.
One would assume that the processors should be requested like this:
#PBS -l nodes=2:ppn=2
Two nodes are are requested, and both processors on each node should
be used. The node file generated by PBS is incorrect though, and if
the processors are requested in this manner, MPI will start both
processes on one node and the second will not be used at all.
It is possible to avoid this problem by re-writing the nodes file to
the correct format. The following script reads the existing node
file from standard input and writes a corrected one to standard
output:
----
#!/usr/bin/env python
import os
import sys
nodes = {}
for line in sys.stdin:
line = line.strip()
nodes[line] = nodes.get(line, 0) + 1
for n in nodes:
sys.stdout.write(n + ":" + str(nodes[n]) + "\n")
----
Save this somewhere like ~/bin/rejigger_nodes.py.
Do not simply use a line line #PBS -l nodes=2:ppn:1 when if fact you
intend to use both processors on the node. If the resource manager
is not informed that your job is using both processors, then it may
schedule another job on the apparently unused processor. Both
processes will have to fight for CPU time, and both will be slower
as a result.
- Sample PBS.sh file:
----
#!/bin/bash
#PBS -S /bin/bash
#PBS -l nodes=2:ppn=2
#PBS -l mem=2gb
#PBS -l walltime=2:00:00
#PBS -M mhughe@uvic.ca
#PBS -m bea
#PBS -N pg_ref_30
#PBS -W x="QOS:parallel"
cd /global/scrach/username/dir/
cat $PBS_NODEFILE | ~/bin/rejigger_nodes.py > nodes.txt
mpirun -np 2 -machinefile nodes.txt /path/to/program args > output.txt 2>&1
----
It may be necessary to use the -B flag to Phred. The MPICH
implementation use on Glacier has a bug that seems to manifest itself
when using lots of non-blocking communication (as Phred does).
** IMPORTANT **
Clean up output files before re-running jobs. None of the files Phred
writes to should exist before it is run. It may also be a good idea to
get rid of the output of rejigger_nodes.py, any PI* files, and any
*.{e,o}* files that exist before restarting things.