U.C. Berkeley Millennium: High Performance Computing

Progess Reports

1998 Q1   Our portion of the equipment donation has been used to support several projects. Our portion of the Titanium project concerns the optimization of loops over dynamically constructed array objects in the Titanium dialect of Java. We have recently succeeded in obtaining performance comparable to that of C or FORTRAN code on at least some scientific applications. Further details on the project are available from the Titanium Web page (http://HTTP.CS.Berkeley.EDU/projects/titanium).

One of the donated machines went to a student organization known as the Experimental Computing Facility (XCF), where it has been used in the support and development of the Project Revision Control System (PRCS), an increasingly popular source-code control system with a particularly simple underlying model. Specific details are available at ttp://www.xcf.berkeley.edu/~jmacd/prcs.html.

1998 Q2   "Frankpack", the code Frank McKenna has written, is running. PETSc has been ported. A class interfacing Frankpack and PETSc has been written by Osni Marques, and it works on a uniprocessor, but is not yet tested on multiprocessors. SuperLU, Xiaoye Li's shared memory direct solver, is running, integrated with Frankpack Prometheus, Mark Adams's solver, is being ported now (there is a problem with file I/O to be debugged; it now hangs) The Streams benchmark has been ported by Melody Ivory, just on SMPs now.

Mark gets lousy speedup on sparse matrix-vector multiply, his inner loop: P Mflops
--- ------
1 30
2 42
4 37
whereas he gets 3x speedup with P=4 on ASCI Blue, and 3.5x on the T3E. Eun-Jin will try her mat-vec code on Mark's matrices.

We are getting closer to needing realistic FE examples to test Frankpack with solvers.

PC Benchmarking We are currently conducting a detailed performance analysis of the processing, memory and communication subsystems of several PCs at the workstation, cluster, SMP and CLUMP levels. The focus of our study is on understanding how to realize optimal performance in high performance computing applications. We have configured a collection of 10 PCs with varying hardware and software parameters along with 3 Sun systems (a workstation, cluster and SMP) for our study. We have completed our benchmarking of the memory subsystem of workstations and SMPs using a modified stream benchmark. We have also collected the corresponding hardware performance counter measurements to facilitate benchmark analysis. We are currently working on benchmarking the processing and communication subsystems using a suite of numerical benchmarks, including Blas 2 and 3, ScaLAPACK, and the NAS parallel benchmarks. We are also maintaining a web site at http://www.cs.berkeley.edu/~ivory/pcbench in order to make benchmark data and analysis, high resolution timers and benchmark software widely available.


February 1999