Parallel Programming Short Course
Summer 2001, an informal lecture series
- instructor: Erik Deumens, deumens at qtp.ufl.edu
- schedule: Two lectures per week Tuesday Thursday 5th period 11:45 am – 12:35 pm. Homework exercises are an essential part of the course.
- Location NPB 1213
- requirements: familiar with one programming language from the following list: Fortran 77/90/95 or C or C++.
- notes: lecture notes provided in the form of PowerPoint notes pages.
Synopsis
This course teaches the practical programming details to create programs that can exploit multiple procesors in compute nodes using the OpenMP and POSIX Threads standard and that can exploit multiple compute nodes in compute clusters using the MPI standard.
Syllabus
- MPI basics message passing; standard; 260 calls defined, only 10 calls really needed; simple example program
- Threads basics threads and processes; POSIX standard; 60 calls, only 8 calls really needed; simple example program
- Parallel computing fine grained, coarse grained; start from global view; choice of parallel objects; distribute actions and data; analysis of matrix multiply
- MPI programming 1 Different forms of matrix multiply
- MPI programming 2 advanced MPI; communication worlds, groups, collective operations
- OpenMP programming 1 concepts; directives; scope of variables
- OpenMP programming 2 internals; scalable matrix multiply; locks
- Threads programming 1 matrix multiply; client/server threads
- Threads programming 2 advanced pthreads; synchronization by initialization, locks and mutexes
Advanced Programming
Spring 2000 PHY 6905 Section 3615
- instructor: Erik Deumens, deumens at qtp.ufl.edu
- schedule: Two lectures per week plus homework exercises in NPB
- Last taught Spring 2000 as PHY 6905 Section 3615
- schedule: Two lectures per week plus homework exercises.
- requirements: familiar with one programming language from the following list: Fortran 77/90/95 or C or C++.
- grading: students will be graded on the homework assignments which are in the form of programs to be written and submitted.
- notes: lecture notes provided in the form of PowerPoint notes pages.
- e-mail list adv-prog at qtp.ufl.edu
Synopsis
This course teaches the high level design principles and practical programming details to create high performance, scalable parallel programs that run well on both distributed memory machines like Cray T3E and IBM RS/6000 SP, and shared memory machines such as Sun Enterprise and SGI Origin and IBM RS/6000 SMP-series, and on clusters of computers connected by a high performance switching network device.
Concepts to be explained and used in the course are object oriented design, software engineering, multi-threading and message passing. Most examples will use Fortran 95, but C and C++ will be used as comparisons. The POSIX thread library and the Message Passing Interface standard will be used.
Syllabus
Each lecture of about 1 hour (45 minutes plus questions) treats one topic. Problems will be assigned for work to be completed individually. Each lecture can stand on its own, but some topics provide background essential for understanding other topics.
- Modern Processors computer architectures; CISC, RISC, EPIC; vector, superscalar; memory, RAM, cache; virtual memory, disk; switches, busses; networks
- Modern Programming: Objects object oriented analysis (OOA); object oriented design (OOD); object oriented programming (OOP); characteristics: encapsulation, information hiding, message passing, late binding, delegation, class/instance/object, generalization/realization without polymorphism and with polymorphism, relationships; universal modeling language (UML); object based language (OBL): Fortran 95; object oriented language (OOL): C++
- Fortran 95 features module; type; interface; pointers; array operations
- Professional tools editor: emacs; builder: make; version control: cvs; debugger performance monitor
- Parallel computers SMP, MPP; SIMD, MIMD; dataparallel; clusters and switched networks; transputers; NUMA, COMA, messages; threads
- RS/6000 SP and O2000 MPP + SMP + switch; hardware; software; user environment; example program
- MPI basics message passing; standard; 260 calls defined, only 10 calls really needed; simple example program
- Threads basics threads and processes; POSIX standard; 60 calls, only 8 calls really needed; simple example program
- Parallel computing fine grained, coarse grained; start from global view; choice of parallel objects; distribute actions and data; analysis of matrix multiply
- Debugging interactive debugging; dbx, xldb, idebug, TotalView, pedb
- MPI programming 1 Different forms of matrix multiply
- MPI programming 2 advanced MPI; communication worlds, groups, collective operations
- OpenMP programming 1 concepts; directives; scope of variables
- OpenMP programming 2 internals; scalable matrix multiply; locks
- Threads programming 1 matrix multiply; client/server threads
- Threads programming 2 advanced pthreads; synchronization by initialization, locks and mutexes
- Monitoring and tuning prof, gprof, xprofiler; vt; program marker array
- Production Runs resource sharing, scheduling; LoadLeveler; job command keywords; job classes, priorities; nodes and CPUs; time, data and stack limits; example job
- Languages and libraries overview of languages and libraries supporting parallel programming
- Case Study: Crystal 98 A case study of parallel programming with MPI in a production program used in Computational Chemistry.
- Case Study: Design with F95 A case study of designing a library with the new features of Fortran 95 on QTIP, a library for computing integrals for Computational Quantum Chemistry.
- Script interface An introduction into extending a scripting language with your software and embedding the script interpreter into your software. Both Tcl and Python are discused.
Textbook
There is no single book that covers the material, however, the combination of the following two works are a good start:
- High Performance Computing (2nd Edition), Kevin Dowd and Charles Severance, O’Reilly and Associates 1998.
- Using MPI: Portable parallel programming with the message-passing interface, William Gropp, Ewing Lusk, Anthony Skjellum, MIT Press, 1997.
- Parallel Programming in OpenMP, Robit Chandra, Leonardo Dagun Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon, Academic Press 2001.
Other important references are:
- Efficient C++ Performance Programming Techniques, Don Bulka, David Mayhew, Addison-Wesley, 2000.
- UML and C++: A practical guide to object-oriented development, Richard C. Lee and William M. Teppenhart, Prentice Hall, 2001.
- MPI – The complete reference, Volume 1, The MPI core (2nd edition), Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, Jack Dongarra, Marc Snir, MIT Press, 1998.
- MPI – The complete reference, Volume 2: The MPI extension, William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, Marc Snir, MIT Press, 1998.
- Multithreaded Programming with Pthreads, Bill Lewis and Daniel J. Berg, Sun Microsystems Press (Prentice Hall), 1998.
- Pthreads Programming, Bradford Nichols, Dick Buttlar and Jacqueline Proulx Farrell, O’Reilly and Associates, 1996.
Useful references are:
- Computing for Scientists: Principles of Programming with Fortran 90 and C++, R. J. Barlow and A. R. Barnett, John Wiley and Sons, 1998.
- Fortran 95 Handbook: complete ISO/ANSI reference, J. C. Adams, W. S. Brainerd, J. T. Martin, B. T. Smith, J. L. Wagener, MIT Press, 1997.
- The C++ Programming Language (3rd edition), Bjarne Stroustrup, Addison-Wesley, 1997.
- Modern C++ Design: Generic Programming and Design Patterns Applied, Andrei Alexandrescu, Addison-Wesley, 2001.
- Using the STL: The C++ Standard Template Library, Robert Robson, Springer, 1997.
- Parallel Programming using C++, Bregory V. Wilson and Paul Lu (editors), MIT Press, 1996.
- Multithreaded Programming with Windows NT, Thuan Q. Pham and Pankaj K. Garg, Prentice Hall, 1995.
- Practical Parallel Programming analysed, Gregory V. Wilson, MIT Press, 1995.
- Designing and building parallel programs: Concepts and tools for parallel software engineering, Ian Foster, Addison-Wesley, 1995.
Format
The course can be taught as a regular class or as a condensed seminar with the sequence of 19 lectures (numbers as above) and 11 practice sessions or home work exercise sessions.
- One semester of 15 weeks at 2 hours per week, or
- One week of 5 days at 6 hours per day.