Tue Aug 27 14:00:00 PDT 2013
Free Compute Cycles During Beta Test of New Head Node
We are in the process of upgrading and updating the software configuration of the EECS compute cluster. As part of the transition, we have set-up a new head node so that the old and new queues can run side-by-side for some time.
The new head node is called “zen.millennium.berkeley.edu”. It and its compute nodes are running the latest Ubuntu long-term server release: Ubuntu 12.04 LTS. The queue manager software has also been updated to Torque 2.5.12 and Maui 3.3.1, and some minor differences may be encountered.The new head node will support both a “zen” queue with the old 3GB/2core Dell 1850s and a “psi” queue with the 48GB/8core HP DL1000s and 256GB/24core Dell R810s.
We are opening up the new cluster for unbilled beta-test usage for the next few days, until September 1, 2013 so that users have a chance to check things out and shake out any remaining bugs before billing starts. After September 1, usage on both head nodes will be billed, and more nodes will gradually be migrated from the old “psi” head node to the new “zen” head node as time goes on.
If there are no significant problems with migrating to the new setup, I would like to switch over most of the compute nodes to the new “zen” head node queues before the end of this billing quarter: Sept 30. (The few remaining 16GB/8core Dell 1950 nodes are likely to remain on the old “psi” head node and “zen” queue for the time being, and may soon be retired.)
Please report any concerns or problems (or gratitude) to email@example.com.
Thanks in advance for your help in checking out the new setup.
Cluster Support <mailto:firstname.lastname@example.org>
Note: The cluster is currently being reconfigured. The following describes the target configuration.
The PSI Batch Cluster currently has 112 compute nodes and 1 frontend with the following configuration:
The nodes are available for batch jobs and interactive sessions the Torque/Maui queuing system from the batch head node is called psi.millennium.berkeley.edu. There are two batch queues: psi and zen. The default batch queue name is “psi”, although a routing queue named “batch” is maintained for compatibility with existing scripts. A secondary queue named “zen” is available to access the nodes formerly available via zen.millennium.berkeley.edu.
The job submission to the Torque/Maui batch system is through “qsub” as before. For interactive reservations, “qsub -I” also grants access to a node. Users who run simple, self-contained batch jobs may not notice very many differences. This acceptance testing period is for finding the potential bumps in the road for the users who use mpirun or gexec, or use version specific software and specialized packages.
Most of the software installed on the the old batch system that is accessible via “zen” is also available on the new cluster through “psi”. Many of the packages will be in newer versions than those running on the old cluster, which may lead to subtle differences for some jobs. Some of the lesser used or more esoteric packages have been dropped, but most of these can be brought back by request as long as they are supported under Ubuntu 9.10.
There are many aspects of the cluster that may be tuned for better performance. We would like to work with you, the end-user community, to help determine how this cluster can best support your needs. For example, the new nodes have 48GB of memory, allowing for larger jobs, but at a slightly slower memory bus clock rate. The new nodes are also running with hyper=threading enabled, allowing up to 16 simultaneous tasks per node. If either of these is deemed problematic, we can turn off hyper-threading on some or all of the nodes to run just 8 physical cores per node, or reduce the memory size on some nodes for faster response on smaller jobs.
Concurrently, we are working on making a new, faster /work space available to the PSI cluster. The Dell Fast Storage Cluster provides high-speed shared file-systems to this compute cluster.
Access to the EECS Department's PSI Compute Cluster is available to all EECS research account holders who have a sponsor that is willing to cover the recharge costs for the resource time used.
If you do not already have access to the PSI Cluster, please see the information on account requests found here.
While it is physically possible to execute jobs from your EECS department home directory, we ask that you launch all jobs from a /work/$user directory.
Use SSH to login to the frontend node:
However, never run programs directly on the frontend node. This can make the whole cluster crash. Jobs on the PSI Batch Cluster are arbitrated by the Torque PBS batch scheduler. Submit a batch script using:
Please end your interactive process as soon as you do not need it any longer, since you reserve one core on a node of the cluster while running.
By default, MATLAB tries to make use of the *all* of the multi-threading capabilities of the computer on which it is running. This causes problems for the batch scheduler unless you tell it that you will be using more than one processor per job, and/or tell matlab to use less than *all* of the cores.
To avoid this, you can set the -singleCompThread option when starting MATLAB to limit MATLAB to a single computational thread.
Alternatively, you can request exclusive use of a node by adding one of the following lines to your batch file:
#PBS -l nodes=1:ppn=8 # for an exclusive reservation of an 8-core node #PBS -l nodes=1:ppn=2:cpu3000 # for an exclusive reservation of a 2-core node #PBS -l nodes=1:ppn=24 # for an exclusive reservation of a 24-core node
You could also use the (deprecated) maxNumCompThreads(N) function in your matlab code as documented here to set the number of threads to use and then request a matching number of cores:
#PBS -l nodes=1:ppn=8 # for a reservation of 8-cores on one node ... maxNumCompThreads(8) or #PBS -l nodes=1:ppn=2 ... maxNumCompThreads(2)
For questions regarding this cluster and other technical support, please contact Cluster Support at email@example.com.