OceanStore

OceanStore/ROC X Cluster 

ROC

Frequently Asked Questions1 

  • Who do I contact if I have problems?
    • Send Email to support@millennium.berkeley.edu We're happy to respond to your comments, complaints, feature-requests, etc.  
  • Who can use the cluster?
    • X Cluster access is currently limited to active members of the OceanStore and ROC projects.  Exceptions will be made with consent of the projects' PI's.  
  • Where do I login?
    • Reservable hosts are x1.millennium.berkeley.edu through x39.millennium.berkeley.edu.  x40 runs FreeBSD and is a reserved Modelnet router.  x42 is not reservable and is for testing and debugging your software first.
    • Older nodes ibm1.cs.berkeley.edu through ibm8.cs.berkeley.edu are also available.
  • How do I find a free node?
    • Please use the Cluster Reservation System.  Email support for the login/passwd.  Please do not reserve nodes more than 36 hours in advance.
    • Use the Ganglia graphs to check system load.
  • I need the machine/network specs for this research paper.
  • Wait, the specs say these are SMP servers.  How come top only lists 1 cpu?
    • You are used to the nonstandard behavior of RedHat top.
    • On a 2 cpu server, 1 cpu running at full capacity will indicate a load of 1.0 and 50% cpu utilization.
  • What OS is this?  It's funny.
  • I can't find program foobar, where is it?
    • Make sure you have /usr/roc/bin (and optionally /usr/roc/sbin) in your path.
    • Various Java distributions are located in /usr/local.
    • We don't mount much of IRIS's Linux SWW because it is a) RedHat centric and b) horribly out of date.
    • Additional software can be installed as needed. Just ask.
  • Some key libraries are missing, even though the programs are installed.  Why?
    • The Debian package system likes to separate out all the useful libraries into "-dev" packages.  Ask and we shall install.
  • Why doesn't rexec work?
    • rexec is no longer supported.  Please use gexec instead.
  • Where do I put my temporary files?
    • Each node has a large (10-50GB) file system mounted as /scratch.  This file system is striped across both disks for speed. Note: The nodes are considered stateless, and while we will try to preserve data in /scratch, we do not guarantee its safety; /scratch is never backed up.
    • We also have a local NFS fileserver for scratch data.  It is mounted as /work on [x1-x42].  This storage is RAID 5, but is not backed up. Use policy is similar to /work on the Millennium cluster, except stale data is not automatically deleted.
  • Where can I keep my important data safe?
    • Any data you're concerned about should be stored on the IRIS provided fileservers; either your home directory or your group's project space.
  • Why is access to my home directory slow/broken?
    • The X Cluster is connected to the Millennium Network, not the EECS network.  Even though it's fully gigabit connected, traffic from the cluster to the CS file servers travels all the way to Evans Hall and back.  Congestion on the campus core routers or at the EECS firewall can cause problems.
    • If your home directory is in some other research group's project space, it may not be exported to the cluster, even if you appear in the roc-l yp netgroup.  Mail support, and we'll request permission to have that project space exported.
  • Gigabit feed, you say?  It certainly doesn't seem to go that fast.
    • The machines can each only push about 400 megabit though.
    • You can check the utilization graphs for the two switches: ocean-gw1, ocean-gw2  It's nowhere near capacity.
  • What is this Myrinet thing you keep talking about?
    • We inherited a bunch of 2nd generation (LANai 7.2) Myrinet equipment from the Millennium project. The PCI64A provides 1.28Gb/sec bandwidth with very low (<10 µsec) latency. More information is available from Myricom.
    • Myrinet provides an Ethernet emulation layer.  Ours uses private IP addresses in the range 192.168.10.[201-241] and have associated private DNS entries [x1-x41].myri .
  • Some of the hosts seem to be missing from the Myrinet; what gives?  
    • The following hosts have PCI Advanced System Management cards instead of Myrinet: x10, x21, x31, x42.
  • User JoeBob is hogging all the CPU time and I can't run my foomulator.  Make him stop!
  • I found some other bug or need to contact the other users of the cluster...
    • There is a mailing list for the cluster users: xcluster-usersmillennium.berkeley.edu.

1 You're right, nobody ever asked these questions. I just don't want to answer them more than once.


Contact: support at millennium.berkeley.edu.  Last modified on 15-Mar-2006 01:22:45 -0800