Description  Books and References  Guest Lectures  Schedule and class materials  Projects


CSC 557/449: Special Topic:

High Availability and Performance Computing Course


Preq., CSC 437, 445 or instructor's permission

Times: Tues and Thus  12:00pm-1:50am

Place: CEnIT Board Room or Innovation Lab, Live view

Instructor: Dr. Box Leangsuksun,

Office: room 237 Nethken Hall, 318-257, 3291

Office Hours: M-W 2-3:50pm or by appointment


Guest Lecturers: (in conjunction with XCR High Performance and Availability Computing Colloquium series)


Guest Speakers

Tentative Schedule and Topic

Dr. Hong Ong,

Oak Ridge National Lab

Dec 7, 2005 at Innovation Lab, 11am

Performance Measurement and Evaluation Tools for Large-scale Systems

    Charles Grassl, IBM TBA

POWER5 Programming and Optimization

More external speakers will be announced soon.....  



The course will expose student to state-of-the-art research and development in High Availability and Performance Computing (HAPC) and related fields. This class is a reading, research and hand-on-oriented education. Activities include studies of HAPC systems and techniques and selected research topics of the current interest. Topics include but not limited to:

  • computer architectures, interconnectivity and programming paradigms, design and analysis and techniques in HAPC applications and systems,
  • cluster computing,
  • data and computational grid computing
  • parallel programming in MPI and Fault Tolerant MPI,
  • reliability and performance modeling,
  • a complete life cycle (design, analysis, development, operation, maintenance) for HAP computing,
  • performance evaluation, reliability analysis,
  • parallel and network storage etc.
  • Distributed Security


Class Materials:


2)    Parallel Programming with MPI by Peter Pacheco Morgan Kaufmann; 1st edition (October 1996) ISBN: 1558603395 (optional).

Other class activities: research, experiment, term projects. The activities will be on an HA-OSCAR Linux cluster[]



Grading Policies:

    Since this class is research (reading) oriented, I think it is more appropriate to evaluate your learning and mastering level of our class objectives into three following categories:

1) Hand-on Term project (40%)

2) Paper (15%) (due right after the charismas break)

3) Exams (25%) and Homework (15%)

4) Attendance (5%)

    Grading scheme:


91 and up A
81- 90 B
71-80 C
below 70 F



Nov 30, 2005  HAPC introduction
Dec 1, 2005  Progress in Supercomputing  by Dr. Horst Simon - Video ,

homework 1 (due date Dec 5)

   Chapter 1 Intro to High Performance Cluster Computing
   Chapter 1 Intro to High Performance Cluster Computing
   Dr. Hong Ong's presentation
   Intro to Grid Computing (powerpoint by Prof.. Ed Siedle from
   The Development of Computational Grid Techniques for the D0 Experiment
  Intro to Globus (by ANL, USC Information Sciences Institute,
  A discussion on Condor-G architecture and fault-tolerance aspect. Materials were excerpted from the condor tutorial,
  continued discussions on Grid computing and 4 primary services, MPI programming
  Load Balancing over Network and Homework #1 (MPI program)
Job and Resource Management System

An HPC file system case study: Lustre (by Peter Braam, presented at IEEE Cluster 2003)

  Lustre discussion (continued) and A case study on a cluster management system: OSCAR and ROCKS
  A case study on a cluster management system: summary(continued).

Quantifying Non-functional Requirements (Availability and Performance)


Suggested term projects:

        HA-cluster with Windows

        Workload Characterization, Performance Modeling and Evaluation for HPC systems/applications

        Applying HPC/HA to solve a specific problem (e.g. sensor networks, bioscience, bioinformatics etc.)

        HA-OSCAR cluster with Windows

        HA and DR-enabled storage system

        Drug discovery cluster

        IPMI-based cluster management.

        HA-cluster and load balancer to support e-commerce/internet services

        HA-cluster and Fault tolerant HPC job schedulers

        Hot-swap Cluster OS

        HA-OSCAR and grid computing

        Performance benefits analysis from HA-OSCAR.

        Beneficial factors from Standards for HAPC environments

        FT LAM/MPI in HA-Cluster

        Performance Benchmark, micro-benchmark & macro-benchmark


[] powered by nine Intel dual Xeon servers and supported in part by an Intel HPC equipment loan