NEWS FLASH

Tue Aug 27 14:00:00 PDT 2013

Free Compute Cycles During Beta Test of New Head Node

We are in the process of upgrading and updating the software configuration of the EECS compute cluster. As part of the transition, we have set-up a new head node so that the old and new queues can run side-by-side for some time.

The new head node is called “zen.millennium.berkeley.edu”. It and its compute nodes are running the latest Ubuntu long-term server release: Ubuntu 12.04 LTS. The queue manager software has also been updated to Torque 2.5.12 and Maui 3.3.1, and some minor differences may be encountered.The new head node will support both a “zen” queue with the old 3GB/2core Dell 1850s and a “psi” queue with the 48GB/8core HP DL1000s and 256GB/24core Dell R810s.

We are opening up the new cluster for unbilled beta-test usage for the next few days, until September 1, 2013 so that users have a chance to check things out and shake out any remaining bugs before billing starts. After September 1, usage on both head nodes will be billed, and more nodes will gradually be migrated from the old “psi” head node to the new “zen” head node as time goes on.

If there are no significant problems with migrating to the new setup, I would like to switch over most of the compute nodes to the new “zen” head node queues before the end of this billing quarter: Sept 30. (The few remaining 16GB/8core Dell 1950 nodes are likely to remain on the old “psi” head node and “zen” queue for the time being, and may soon be retired.)

Please report any concerns or problems (or gratitude) to support@millennium.berkeley.edu.

Thanks in advance for your help in checking out the new setup.

Cluster Support <mailto:support@millennium.berkeley.edu>

PSI 64-bit Batch Cluster Hardware

Note: The cluster is currently being reconfigured. The following describes the target configuration.

The PSI Batch Cluster currently has 112 compute nodes and 1 frontend with the following configuration:

  • Four (4) Dell R810 (psi queue)
    • Quad Hex-Core/HT Intel(R) Xeon(R) CPU E7540 @ 2.00GHz
    • 18MB L3 cache
    • 256GB of DDR3-1066 RAM
    • 1.6TB raid5 ext4 root partition including a local /scratch
    • Ubuntu 10.4.1 LTS (Long Term Server) (Debian compatible, 64-bit em64t)
    • Linux 2.6.34.x kernel
  • Twenty-Four (24) HP DL1000 nodes (psi queue)
    • Dual Quad-Core/HT Intel(R) Xeon(R) CPU E5550 (Gainestown) @ 2.67GHz
    • 24MB L3 cache
    • 48GB of DDR3/1066 RAM
    • 2 1TB 7.2K rpm SATA2 disk (one available for special needs)
    • Ubuntu Server 9.10 Linux (Debian compatible, 64-bit em64t)
  • 20 Dell PowerEdge 1950 (zen queue)
    • 2 Quad-Core Intel(R) Xeon(R) CPU E5345 @ 2.33GHz
    • 4MB L2 cache
    • 16GB of RAM
    • 1 300GB 10K rpm SAS disk
    • Ubuntu Server 10.4 LTS 64-bit Linux
  • 64 Dell PowerEdge 1850 (zen queue)
    • 2 Intel(R) Xeon(TM) CPU 3.00GHz
    • 1MB L2 cache
    • 3GB of RAM
    • 2 147GB 10K rpm SCSI disks
    • Myricom Myrinet 2000 M3S-PCI64B
    • Ubuntu Server 10.4 LTS 64-bit Linux

The nodes are available for batch jobs and interactive sessions the Torque/Maui queuing system from the batch head node is called psi.millennium.berkeley.edu. There are two batch queues: psi and zen. The default batch queue name is “psi”, although a routing queue named “batch” is maintained for compatibility with existing scripts. A secondary queue named “zen” is available to access the nodes formerly available via zen.millennium.berkeley.edu.

The job submission to the Torque/Maui batch system is through “qsub” as before. For interactive reservations, “qsub -I” also grants access to a node. Users who run simple, self-contained batch jobs may not notice very many differences. This acceptance testing period is for finding the potential bumps in the road for the users who use mpirun or gexec, or use version specific software and specialized packages.

Most of the software installed on the the old batch system that is accessible via “zen” is also available on the new cluster through “psi”. Many of the packages will be in newer versions than those running on the old cluster, which may lead to subtle differences for some jobs. Some of the lesser used or more esoteric packages have been dropped, but most of these can be brought back by request as long as they are supported under Ubuntu 9.10.

There are many aspects of the cluster that may be tuned for better performance. We would like to work with you, the end-user community, to help determine how this cluster can best support your needs. For example, the new nodes have 48GB of memory, allowing for larger jobs, but at a slightly slower memory bus clock rate. The new nodes are also running with hyper=threading enabled, allowing up to 16 simultaneous tasks per node. If either of these is deemed problematic, we can turn off hyper-threading on some or all of the nodes to run just 8 physical cores per node, or reduce the memory size on some nodes for faster response on smaller jobs.

Concurrently, we are working on making a new, faster /work space available to the PSI cluster. The Dell Fast Storage Cluster provides high-speed shared file-systems to this compute cluster.

Cluster Accounts

Access to the EECS Department's PSI Compute Cluster is available to all EECS research account holders who have a sponsor that is willing to cover the recharge costs for the resource time used.

If you do not already have access to the PSI Cluster, please see the information on account requests found here.

Filesystems

While it is physically possible to execute jobs from your EECS department home directory, we ask that you launch all jobs from a /work/$user directory.

  • ”/work” ”/work2” and ”/scratch” Filesystems

Using the Cluster

Use SSH to login to the frontend node:

   psi.Millennium.Berkeley.EDU

However, never run programs directly on the frontend node. This can make the whole cluster crash. Jobs on the PSI Batch Cluster are arbitrated by the Torque PBS batch scheduler. Submit a batch script using:

   qsub example.sh

An example script can be found here and here. You can also start an interactive process via:

   qsub -I

Please end your interactive process as soon as you do not need it any longer, since you reserve one core on a node of the cluster while running.

Software

Matlab

By default, MATLAB tries to make use of the *all* of the multi-threading capabilities of the computer on which it is running. This causes problems for the batch scheduler unless you tell it that you will be using more than one processor per job, and/or tell matlab to use less than *all* of the cores.

To avoid this, you can set the -singleCompThread option when starting MATLAB to limit MATLAB to a single computational thread.

Alternatively, you can request exclusive use of a node by adding one of the following lines to your batch file:

#PBS -l nodes=1:ppn=8           # for an exclusive reservation of an 8-core node 
#PBS -l nodes=1:ppn=2:cpu3000   # for an exclusive reservation of a 2-core node
#PBS -l nodes=1:ppn=24          # for an exclusive reservation of a 24-core node 

You could also use the (deprecated) maxNumCompThreads(N) function in your matlab code as documented here to set the number of threads to use and then request a matching number of cores:

#PBS -l nodes=1:ppn=8           # for a reservation of 8-cores on one node 
...
maxNumCompThreads(8)

or

#PBS -l nodes=1:ppn=2
...
maxNumCompThreads(2)

MPI

  • If you are unfamiliar with MPI, please read our MPI Tutorial. MPI jobs can be run over gigabit ethernet using the P4 version. MPI jobs can also be run over Myrinet/GM on the older 1850 nodes.

Cluster Usage Metrics

  • Cluster usage is measured in terms of the maximal of wall-clock and cpu-time per user as documented here.

Contact

For questions regarding this cluster and other technical support, please contact Cluster Support at support@millennium.berkeley.edu.

 
psi.txt · Last modified: 2013/08/27 14:02 by jonah
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki