News Archive
- Fri Dec 21 2007: We will be performing maintenance on PSI, CITRIS, and the /work
fileservers, Thu Dec 27 09:00 - Fri Dec 28 23:59. The clusters will be down during this time.
- Mon Dec 17 2007 23:48: Zen was rebooted. Running jobs should not have been disrupted.
- Mon Dec 17 2007 22:12: The PSI cluster is unreachable due to
someone overloading the frontend (zen). Unfortunately remote
hardware reset has been unsuccessful thus far and it may have to be dealt with
in the AM.
- Mon Dec 3 2007 11:00: The EECS firewall firmware has been
updated; hopefully this will resolve some of the problems.
- Thu Nov 29 2007 08:47 - 09:30: Connectivity to the Millennium
network was disrupted due to cut fiber optic cables in Evans Hall. We
were able to route around the problem to restore connectivity.
- Mon Nov 19 2007 09:45: The SAN was reset and the /work servers
restarted. PSI & CITRIS /work are back online. Unfortunately, some jobs
failed due to stale NFS filehandles.
- Mon Nov 19 2007 08:30: There are problems with the SAN that
provides and /work to PSI and CITRIS. Cause and ETA to repair are not
yet known.
- Mon Nov 12 2007: There appears to be a problem with the
network switch which connects the 1st 32 CITRIS nodes. New jobs will
not be directed to those nodes. The problem will be investigated
further on Tuesday.
- Fri Oct 19 2007: The EECS network is experiencing
connectivity problems; access to EECS resource including home
directories and license servers may be impaired.
- Fri Oct 12 2007: Mail to mailing lists hosted by Millennium
was delayed from Wednesday until today because we inadvertently failed
to restart the list software following security patching.
- Tue Oct 9 2007 09:30: The EECS license server is back
online.
- Mon Oct 8 2007 14:10: There are ongoing problems with the
EECS license server for Matlab and Mathematica.
- Mon Oct 8 2007: We were able to successfully reproduce the
NFS failures while engineers from the firewall company were debugging
the system. One feature which may be related to the problem has
been disabled, but it doesn't seem to have fixed the issues.
- Mon Oct 1 2007: Unfortunately the EECS firewall problems are still
ongoing. NFS relate symptoms include rejected or delayed SSH connections
when a node cannot reach your EECS home directory. The rpc.mountd requests
succeed, but the new NFS connections are blocked by the firewall, sometimes
causing autofs to become stuck, occasionally even spiraling out of control.
Unfortunately, we are completely at the mercy of the EECS department IT group
and the firewall vendor to have this fixed.
- Fri Oct 5 2007: On Monday morning, we will be working on-site with EECS
networking staff and a high level
engineer from the firewall company to try to reproduce the problems so they
can get debugging information.
- Fri Sep 21 2007: Ongoing problems with the newly replaced EECS
firewall are causing sporadic problems with NFS automounting and traffic to
the EECS department fileservers (/home/{eecs,cs} and /project).
- Tue Sep
18 2007: zen became wedged due to runaway processes and has been rebooted.
- Mon Sep 17 2007: Due to a failure involving the DHCP server, a
number of machines lost DHCP leases and became unreachable starting around 4pm
Sunday. DHCP service has been restored and hosts should come back online
as they are granted new leases.
- Tue Sep 4 2007: yin and yang were taken offline at 5pm for OS
upgrades and integration into the batch system. All PSI
nodes are now batch scheduled.
- Tue Sep 4 2007: Our mail
server and website were offline for a couple hours on Saturday.
- Thu Aug
30 2007: The support@millennium RT
queue was broken overnight, but we believe we recovered all help requests.
If you don't receive a response, please send it again.
- Wed Aug 29 2007: Now installed:
- Intel C & Fortran Compilers on x64 and ia64 upgraded to 10.0.026.
- papi-3.5.0 installed.
- jrockit updated to R27.3.1-jdk1.5.0_11. jrockit R27.3.1-jdk1.6.0_01
available on x86 & x64.
- Tue Aug 28 17:00:00 PDT 2007: zen was rebooted to fix NFS problems which were preventing people from logging in.
- Fri Aug 24 18:28:16 PDT 2007: The core router which serves the PSI cluster and numerous administrative machines just crashed unexpectedly and required a hard reboot. This may be related to recent firmware upgrades; the firmware has been reverted.
- Mon Aug 20 2007: The fileservers which host
/work and /home/citris are
undergoing maintenance and will experience brief transient outages throughout
the day. - Wed Aug 8 2007: Now installed:
Intel Open Source
Computer Vision Library (OpenCV-1.0.0)
- Tue Aug 7 2007: The
CITRIS Cluster has been switched to the Torque Resource Manager and Maui
Scheduler to provide a consistent environment with the PSI Cluster. Batch
scripts should once again use the nodes=N format as shown in the
example script.
- Fri Aug 3 2007: The Oceanstore/ROC
X Cluster will be reinstalled on or after August 15, 2007:
Notice.
- Fri Jul 27 12:46:05 PDT 2007 PSI
/work is back online.
- Fri Jul 27 12:00:56 PDT 2007 PSI
/work is currently down. We're working on it.
- Tue Jul 24 2007: BY THE END OF THE SUMMER, ALL
PSI NODES WILL BE CONVERTED TO THE 64-BIT OPERATING SYSTEM AND MERGED INTO THE
BATCH SCHEDULED POOL.
- Tue Jun 12 2007: CITRIS /work was temporarily offline due to an
unknown failure. Service has been restored.
- Fri Jun 1 2007: New
Matlab for x86 and x64/em64t is installed in
/usr/mill/pkg/matlab-r2007a.
- Mon May 14 2007: Unscheduled Air Conditioning work necessitated the emergency shutdown of many CITRIS cluster nodes. Unfortunately, some batch jobs were terminated. Several PSI nodes also shut down due to the excessive temperatures. All PSI nodes back online.
- Mon Apr 23 2007: The i3 cluster will be
in use for undergraduate instruction until the end of the semester.
- Thu Apr 5 2007:
Dell High
Performance Computing Workshop for Researchers and Scientists:Wednesday
April 25th.
- Tue Mar 20 18:00:00 PDT 2007: EECS network connectivity was
restored. Many thanks to the EECS staff and all their
hard work.
- Tue Mar 20 14:45:32 PDT 2007: The EECS network is experiencing problems. /home/(cs|eecs) is unavailable.
- Wed Mar 14 10:00 2007: A user's processes exhausted the memory on
the majority of the PSI cluster nodes at around 2am, causing them to become
unresponsive. The affected nodes have been rebooted.
- Wed Mar 7 11:53:03 PST 2007: CITRIS
/work is back online.
- Wed Mar 7 11:35:49 PST 2007: CITRIS
/work is temporarily unavailable. We're working on it.
- Fri Mar 2 15:45 2007: lemon crashed and was rebooted.
- Thu Mar 1 2007: New
Matlab for x86 and now also x64/em64t is installed in
/usr/mill/pkg/matlab-r2006b. -
Fri Feb 23 2007: New Java versions in /usr/mill/pkg:
jrockit-R27.1.0-jdk1.5.0_08,
jdk1.5.0_11, jdk1.6.0. - Mon Feb 26 2007:
New Cluster Online!
Announcing the new PSI BATCH Cluster: 20
dual-socket quad-core Xeons with 16GB ram each!
- Tue Feb 20 2007: The CITRIS
Cluster was upgraded to Debian Etch on Linux 2.6.18 and all CITRIS
nodes are now under control of the PBS batch scheduler. This change
is the result of feedback collected by the EECS computing survey. The
new version of PBS requires minor changes to PBS scripts to ensure you are
allocated the types of nodes you want. Additionally, 29 CITRIS nodes have had
Myrinet PCI64B cards added.
- Thu Feb 8 00:00 PST 2007: Following service disruptions
resulting from the home directory outage, the remainder of the CITRIS
nodes were upgraded to Debian Etch on Linux 2.6.18.
- Wed Feb 7 12:06 PST 2007:
/home/citris was temporarily offline due to server problems.
- Tue Jan 30 08:00 2007: Nodes s42-s62 will be
reserved for a timing experiment and will be unavailable for general use this
weekend. Existing processes will be terminated. The nodes will be
reserved 16:30 Friday, Feb 2 through 10:00 Monday, Feb 5. Please plan
accordingly.
- Thu Jan 11 2007: User accounts belonging to students not registered
for Spring 2007 have been disabled. If you think your account was disabled in
error, please email support at Millennium.Berkeley.EDU.
- Tue Dec 12 05:50 2006: There was a loss of connectivity between the Millennium network and the rest of campus from 23:30 to 04:00. All clusters were unavailable during this time. The cause of the outage is still being investigated.
- Fri Aug 4 11:50 2006: There are connectivity
problems to the EECS network. Home directory access will be impared.
- Tue Jul 25 23:50: PSI and CITRIS nodes are back online.
- Tue Jul 25
2006: All IDLE systems are being shut down to reduce electrical load due
to the California stage 3 power alert.
-
Tue Jul 25 02:30 2006: PSI /work is back up for now. Further maintenance
will be scheduled soon.
-
Mon Jul 24 12:00 2006: PSI /work is offline due to filesystem problems.
Work in progress. ETA unknown.
- Mon Jul 17 11:20:22 2006: s55 through s62 are reserved for a timing
experiment Thursday July 20 17:00 - Friday Jul 21 10:00. All other user
processes will be suspended or terminated during this period.
- Thu Jun 29 09:00 2006: The
Millennium cluster was offline overnight due to an electrical problem.
Service has been restored.
- Sun Jun 25 00:45 2006: EECS User processes on nearly every cluster had to be killed to clear stale NFS filehandles following unexpected problems during IRIS planned coeus (/home/eecs) downtime. IRIS Announcement
- Fri Jun 9 2006: New docs on Matlab & licenses.
- Tue Jun 6 9:00:00 2006: All CITRIS & PSI
nodes are back.
- Fri Jun 2 19:35:25 2006: Due to Air Conditioning problems, idle CITRIS & PSI cluster nodes have been shut down to reduce the heat load.
- Fri May 17 2006: The southern half of the sMote testbed was removed for RAD Lab
construction.
- Tue Apr 4 2006:
BEA JRockit offers a native Java 1.5 JVM for ia64. It is installed
in
/usr/mill/pkg/jrockit on the CITRIS
cluster.
- Tue Apr 4 16:30 2006: PSI
/work and the other
filseystems are back online.
- Tue Apr 4 13:00 2006: PSI
/work is currently unavailable. Following
the 3rd crash in a week, we're performing extended diagnostics of the filesystem. ETA unknown.
- Fri Mar 31 2006: The Intel C & Fortran compilers on have been upgraded to 9.0.032 & 9.0.033, respectively on both x86 & ia64.
- Sun Mar 26 2006 20:00: The PSI cluster
/work fileserver crashed on Friday morning around 10am. Unfortunately our staff were all out of town and the fileserver remained offline until Sunday evening. - Tue Mar 28 2006 12:34: Most of the IDSG-supported services will be shut down on Thursday, March 30th, between 7:00 and 8:30 AM. This will be coinciding with the network upgrade in Soda Hall. In particular, this will affect fileservers coeus and project, as well as the IMAP server. https://iris.eecs.berkeley.edu/news/
- Tue Feb 28 2006 15:00: All clusters are online
- Tue Feb 28 2006 08:00: Campus suffered a power outage. Power has been restored, but the clusters are all still offline.
- Tue Feb 28 2006 13:00: Nano is back online.
- Tue Feb 28 2006 12:00: Service has been restored to: PSI, CITRIS, CITRIS Batch, NLP, i3, PlanetLab, & sMote. Still offline are: Millennium, OceanStore, RAD Lab, IBM, Nano.
- Tue Feb 7 2006: All Nano nodes are online for testing now.
- Mon Jan 30 2006: The private Nano Cluster is now partially online for testing.
- Fri Feb 3 2006 14:30: s32 crashed & was rebooted.
- Fri Feb 3 2006 13:30: The Millennium network was disconnected from EECS 12:30-13:30 today as a preventative measure following unrelated service disruptions beginning at 08:30. Millennium staff worked with IRIS systems and network engineers to isolate and block DoS attacks against the EECS network. Connectivity to EECS has been restored. Official IRIS announcement.
- Fri Feb 3 2006 08:30: s28 crashed & was rebooted.
- Fri Dec 30 2005 11:45: This morning, unplanned downtime on sww.eecs.berkeley.edu resulted in various problems, particularly with the OceanStore & IBM clusters.
- Tue Dec 20 2005: PSI nodes s1-s16 are reserved for timing experiments Monday, January 9 12pm-6pm.
- Thu Nov 3 2005: AC work is complete.
- Mon Oct 31 2005 10:25: Air Conditioner work is in progress
in the main machine room. CITRIS and/or PSI may be shut down without
notice if servers overheat.
- Wed Oct 12 2005: Millennium Cluster Changes: The remaining 16 Quad
700Mhz 2GB Millennium nodes were
replaced by 8 Dual 1.4GHz 4GB nodes to reduce heat load. 3 racks of
servers were condensed to 8U.
sonoma.Millennium.Berkeley.EDU no
longer exists. Use napa. Millennium has Myrinet again. Y2K
technology NOW!
- Thu Sep 22 2005: Now hiring part-time student Linux administrator:
Assistant III - Senior Engineering Aid
- Wed Sep 21 2005: NEW CITRIS NODES! 30
new HP rx1620 dual 1.3GHz Itanium2 servers running Linux 2.6.11 have been
added to the interactive
CITRIS cluster.
- Mon Sep 12 2005 16:45: The
/work fileservers crashed again due to a faulty
UPS. The fileservers are back online on a different circuit. All
clusters seem to have recovered cleanly. Send email to support ASAP if
you are experiencing problems.
- Thu Sep 8 2005: Reservation systems are now available for the
i3 and NLP clusters.
- Mon Aug 14 2005: The /work fileservers rebooted unexpectedly, causing
stale NFS filehandles on all cluster nodes. Millennium was rebooted.
Filesystems were remounted on CITRIS. PSI recovered by itself.
Send email to support ASAP if you are still experiencing problems.
- Fri Aug 12 2005: All PSI nodes are operational.
- Thu Jun 23 2005: Problems with the Millennium/PSI router caused service
interruptions to both clusters this morning. The older 550MHz Millennium
cluster nodes have been taken offline. We recommend using the
Fast Storage Cluster instead.
- Thu Jun 30 09:00 2005: The PSI Cluster is again available for general
use.
- Tue Jul 21 2005: s18,s26,s29,s49,s53,s56,s57,s62 in the PSI Cluster are
reserved for 4 hours from noon till 4pm
- Tue Jul 20 2005: s17 in the PSI Cluster is reserved for another 2 weeks
for the PSI group.
- Fri May 20 2005: All 32 CITRIS Itanium 2 nodes
with Myrinet (c17-c48) are now in the Batch Cluster "workq".
-
Mon May 2 2005: We have disabled the SSH version 1 protocol to
comply with the UCB
Minimum Security Standards.
- Mon April 4 2005: We are happy to announce our new
Fast Storage Cluster is online and available for
general use. Part of the Petabyte
Storage Infrastructure project, this cluster consists of 64 Dell PowerEdge
1850 dual 3.0GHz Xeon servers, each w/ 3.0GB of ram and 250GB local storage.
- Thu Mar 10
2005: 2 new machines
w/ dual 1.5GHz Itanium-2 CPUs and 8GB of RAM were added to the CITRIS Itanium II PBS Batch Cluster. They have been assigned to a new PBS queue named "bigmem".
The
CITRIS HOWTO has been updated to include
a basic PBS howto.
- Wed Jan 26 2005: Ongoing problems with the Air Conditioning have forced us to deactivate many Millennium nodes.
- Mon Jan 3 2005: SSH host keys on the Millennium Cluster changed. Millennium has been reinstalled with Debian Linux with a 2.6.9 kernel. Functionality may differ from the previous RedHat 9 installation. You may need to recompile your binaries. The
new configuration and installed software is
very similar to the CITRIS Itanium II cluster.
- Fri Nov 5 2004: Exclusive Batch partition on the CITRIS
cluster was increased to 16 machines, c33-c48. The frontend for launching
PBS jobs is
grapefruit.Millennium.Berkeley.EDU
- As part of PSI, the /work filesystems for the CITRIS and Millennium clusters have been moved to
new servers. Bigger, better, faster.
- Fri Jan 21 2004 23:30:00: Another Air Conditioning failure this evening.
Much of Millennium is offline again.
- Tue Jan 18 2004: Millennium is now fully available.
- Mon Jan 17 2004: Millennium nodes mm1 through mm18 have been
reactivated.
- Fri Jan 14 2004: A power surge on campus disrupted Air Conditioning in
the Millennium Machine room. The Millennium Cluster is offline to reduce heat
load until further notice.
- Ongoing: If you are having trouble accessing your
home directory, it may be due to ongoing
problems with the EECS department infrastructure.
- Mon Nov 8 2004
23:00: The oldest Millennium nodes, mm1-mm30, have been reinstalled with
Debian Linux and removed from the main
Millennium Cluster. They're running Linux kernel 2.6.9 and GM 2.0. We're still
testing, but please take some time to login and see how things work before we
roll it out to the faster nodes.
- Mon Apr 12 2004 14:32: CAMPUS
POWER OUTAGE.
- Wed Mar 24 2004: Myricom cards replaced in c17, c18, c21, c22,
c23, c31, c36.
- Mon Mar 1 2004 15:15:: Napa hung and was rebooted.
- Mon Mar 1 2004: mmnt, ups, & bellevue are permanently offline.
- Wed Feb 25 2004: Campus briefly lost electricity at 10:45pm. All Clusters are
back ONLINE.
- Tue Feb 24 2004 : New website.
- Mon Feb 23 2004: 10 new CITRIS nodes online
UC Berkeley Clustered Computing
- Last modified on
24-Jan-2008 16:47:25 -0800