Fall 2004, a series of lectures
ScopeThis lecture series is given at a time when High Performance Computing is moving to the foreground of people’s attention. At UF the particular event is the installation this Summer of a large Dell cluster that is phase 1 of a Grid for Research at University Florida (GRUF).The intended audience is interested faculty, researchers, post-doctoral associates, graduate students and undergraduates.
To accommodate as much as possible the time constraints of such an audience, the course is structured as a series of stand-alone lectures. Making each lecture independent of all others will allow more people to attend the parts that benefit them most. Thus faculty can attend some or all lectures and learn the terminology, basic concepts and principles to plan the use of High Performance Computing in their research and teaching projects. The second component of the course comes in the form of home work assignments. These will be substantial and it is expected that all post-docs, graduate and undergraduate students will work them in detail. Participants can come and see the instructor individually or in small groups to ask questions and discuss problems or to explore certain topics as far as they wish. |
|
Synopsis
This lecture series teaches the practical details to
- understand architecture and design of modern high performance computers, clusters and grids;
- manage computations on them;
- create reliable and maintainable software that effectively uses them, exploiting
- multiple processors on shared memory systems with OpenMP and POSIX Threads,
- multiple nodes in clusters and grids using the MPI message passing standard.
Syllabus
- Lecture 1 (Aug 30): Hardware Architecture of HPC systems
Audience: everyone
Hardware architecture of high performance computing:- nodes: processors, cache, RAM, disks, RAID
- networks: Ethernet, Myrinet, Infiniband, FibreChannel, iSCSI, SAN
- Lecture 2 (Sep 14): Software Architecture HPC systems
Audience: everyone
Software architecture of high performance computing:- nodes: operating systems (Linux, AIX, Solaris, Windows), interpreters (Python, Perl, Java), compilers (C/C++/C#, Fortran, Ada), libraries (BLAS, LAPACK), OpenMP, POSIX Threads, interprocess communication (IPC)
- networks: sockets, remote procedure call (RPC), network file system, CORBA
- clusters: nodes, communication, storage, message passing interface (MPI) parallel file system, workload management systems
- grids: clusters, middle ware (Globus)
Access the system, start a job, monitor job evolution, manage job data.
User model:- nodes: user id’s (authentication, authorization), interactive use, batch use (PBS, LoadLeveler)
- clusters: head node, worker nodes, interactive use, batch use, data access (input, scratch, output)
Work model:
- grids: authentication, scheduling, data access
homework 1: Configure a cluster purchase for your chosen problem.
- Lecture 3 (Sep 21): Classification of HPC Work
Audience: everyone
Finding best way to use the system for your HPC work- long running or large memory serial computation: need a single, powerful node
- parameter space parallelism: many independent serial jobs
- shared memory parallelism: multiple processors in one OS image access shared data
- loosely coupled distributed memory parallelism: multiple processors in different OS images work on distributed data and share data with low intensity communication
- intense communication distributed memory parallelism: multiple processors in different OS images work on distributed data and share data with frequent and high bandwidth communication
- massive parallelism: very large number of processors cannot work on shared memory and must work on distributed data and even a little communication can cause problems because of the number of tasks involved.
- need to read data: problem of data mining
- need to write data: problem with output for visualization
- need to write temporary data: problem large scratch space requirements such as electronci structure calculations
homework 2: Classify a list of HPC jobs
- Lecture 4 (Sep 28): Programming is software engineering
Audience: everyone
The life of a program (not a programmer)- problem analysis and solution: the program implements an algorithm that provides a solution to some formulated problem
- program design and prototyping: every program has a prototype, if you write the complete program in one session, you just created a completely satisfactory prototype.
- writing and maintaining source code: use source code control software and editors to make this task systematic and consistent
- managing complexity: use software engineering techniques such as data hiding, modules, software components, object oriented design and programming to keep control over the complexity of the project.
- testing and validation: build your software from the start with the notion of verifiable tasks and tests that must be completed and run the test suite continuously.
- Lecture 5 (Oct 5): Programmer tools
Audience: programmers- source: where to store (size, permissions, backup), revision control (CVS software)
- compiling and linking: where to store (size, permisions, backup), version control (Makefile), compilers (GNU, vendor, flags, standard compliance, finding libraries)
- running and debugging: where to store test data (size, permissions, backup), automation for consistency (scripts), interactive debuggers
homework 3: Work through all the steps with a provided program.
- Lecture 6 (Oct 12): Algorithms
Audience: everyone- definition: what is an algorithm
- types of algorithms: quality of an algorithm, an algorithm that is too clever, complexity theory
- implementation of algorithms:
- Lecture 7 (Oct 19): MPI programming part 1
Audience: programmers
Basic message passing- setup: MPI_init, MPI_finalize
- messages: MPI_send MPI_recv
- synchronization: MPI_barrier
homework 4: Debug and run a given example program that implements a matrix multiply with minimal MPI.
- Lecture 8 (Oct 26): MPI programming part 2
Audience: programmers
Advanced message passing- communicators: groups and collective operations
- asynchronous communication: MPI_isend, MPI_irecv, MPI_wait
homework 5: Change the program of homework 5 to measure computation and communication times.
- Lecture 9 (Nov 2): OpenMP programming
Audience: programmers
Concepts and details for using OpenMP directives- directives:
- scope of variables:
- locks:
homework 6: Compile and run an OpenMP Fortran 90 program.
- Lecture 10 (Nov 9): Threads programming
Audience: programmers
Concepts and details for using POSIX threads- threads: creation and termination
- synchronization: by initialization, with locks and mutexes
homework 7: Compile and run a C program using POSIX threads.
- Lecture 11 (Nov 16): The GRUF DELL cluster
Audience: everyone
By this time the cluster should be partially available for some restricted use by some people. The QTP clusters will also be discussed as examples.- hardware: nodes, network, storage
- system software: operating system, compilers, libraries, grid middle ware
- user software: applications and access methods
- policies and practices: rules for access and use
- Lecture 12 (Nov 23): Managing clusters and grids
Audience: system managers- functionality: operating system tools, reliability, available software
- security: allow, monitor and control access
- performance: monitoring to find and eliminate bottle necks
- Lecture 13 (Nov 30): Software engineering
Audience: programmers
Case study of a simple program for Monte Carlo calculations, comparing “writing a program” to “software engineering”. - Lecture 14 (Dec 7): The future of HPC
Audience: everyone
Analysis of the evolution of computing and trends in technology.- hardware electronics and processors and quantum computing
- systems clusters, grids, autonomous computing
- software user interfaces and computing services, languages, tools
Reference material
General High Performance Computing A good place to start.
|
Clusters and grids Recently numerous books on grids and clusters have appeared. Any of these provides good background material or reference material.
|
General programmer references Basic reference material for software engineers.
|
Detailed programmer references For serious work on complex software.
|
Object oriented design Advanced references for object programming.
|
Software engineering Advanced references for construction of complex software with teams of developers.
|