Jötunn - Cluster
Jötunn is an IBM BlaceCenter cluster computer of the university of Iceland, consisting of a master node and 42 computer nodes running Red Hat Linux. Each node contains an Intel Dual-core Xeon 2.8 GHz processor with 4 GB memory. Computer nodes are located in three racks of 14 nodes each and named accordingly. There are 3*14 =42 nodes in Jotunn cluster. Each of the computer node has a CPU with two cores and each core is hyper threaded.
2. Access to the cluster:
All users from the university of Iceland have access to the jotunn cluster. For external users please send email to grid-support [at] hi.is
In order to use the cluster, you need to connect master node i.e jotunn.rhi.hi.is via ssh. UNIX/Linx machine users can simply run command below :
- For Linux users, ssh can be used.
- For MacOS X users, Fugu can be used.
- For Windows users, PuTTY can be used.
PS: Your username and password is the same as your HI/Ugla and your home directory is also same place as your HI.
Once your login successful, there are certain environment needs to be setup for each user before submitting jobs. To do so please run the following command on your shell.
This sets up all the necessary environments for each user in all compute nodes. It is strongly recommended that user should never submit jobs in front -node.
4. LRMS on Jotunn:
The Local Resource Management System (LRMS) on jotunn is Torque from cluster resources. It is based on older Portable Batch System (PBS) which is widely used to manage resources on UNIX/Linux clusters.
5. Scheduler :
Troque resource manager has its own job scheduler however we have used Maui job scheduler to manage the workload on the compute nodes. There are many torque commands but you will use most frequently such as :
- qsub – to submit jobs for execution - eg: qsub <jobname>
- qstat – to see the status of your job in the system - eg: qstat <jobname> or qstat -a
- qdel – to delete your job - eg: qdel <jobname>
There are four queue in jotunn namely short, medium, long and batch. Each queue has diffeernt properties with prirotity. The default queue is the “batch”. Properties of each indiviual queues can be seen with the following commands.
qmgr -c 'p s'
The description of all queues are as follows :
Short : highest priority(100), max totoal cpu time is 8 hours, two computer nodes are only in short queue i.e node j205 and j303.
Medium: high priority queue (80), max tottal cpu time is 96 hours, 40 possible compute nodes
long: lower priroty (40), max running time (walltime) is 1 week, 40 possible compute nodes.
Batch: It is a default queue in the cluster and has a lowest priority (30), max total cpu time is 28800.
7. Examples submitting jobs with different queues:
For short/medium/long/batch :
qsub -q short <jobname>
qsub -q medium <jobname>
qsub -q long <jobname>
qsub -q batch <jobname>
8. Sample Trouque script
Please also have a look into /etc/lrms for more sample script
Numerous applications are installed on the cluster, the most frequently used being: