I really enjoy taking courses with useful, practical content. It doesn’t happen all the time, but I’ve had a good run of luck recently. Tuesday’s lecture in the ECEC622 Parallel Computer Architecture course I’m taking was about OpenMP — a very easy-to-use, open-source API for C/C++ and Fortran.
OpenMP is what an API should be — both useful and very easy to incorporate into an application, even for programmers encountering it for the first time. Within a few minutes of learning the fundamentals, I was able to write a multithreaded version of the ubiquitous “Hello, World!” program.
Of course, whenever I want to really start to learn a language, architecture, or API, I use it to write a Mandelbrot Set program. Mandelbrot Set calculation lends itself extremely well to parallelization APIs like OpenMP — it’s what programmers refer to as “embarrassingly parallel.”
Here is the Mandelbrot Set calculation code. The OpenMP modifications are shown in blue. Omit them, and the code does exactly the same thing, but without the parallelization.
#include <stdio.h> #include <omp.h> //Mandelbrot Set calculation routine, to test //speedup obtained from using OpenMP //M. Eric Carr //mec(eighty-two) .at. drexel (dot...) edu int main(){ const double rmin = -2.2; const double rmax = 1.4; const double imin = -1.8; const double imax = 1.8; const unsigned long long maxiter = 20000; const unsigned long long xres = 2000; const unsigned long long yres = 2000; double a, b, r, i, h; //Private variables for threads unsigned long long totalcount=0; unsigned long long count=0; unsigned long long x,y; unsigned long long iter; double dx, dy; dx = (rmax-rmin)/xres; dy = (imax-imin)/yres; #pragma omp parallel for private(a,b,r,i,h,x,y,iter) reduction(+:count) for(y=0;y<yres;y++){ b = imax-y*dy; for(x=0;x<xres;x++){ r=0; i=0; a = rmin + x*dx; iter=0; while(iter<maxiter && r*r+i*i<=4.0){ h=(r+i)*(r-i)+a; i=2*r*i+b; r=h; iter++; } if(iter>=maxiter-1){ count++;} } //for x } //for y #pragma omp barrier printf("dx is: %F\n",dx); printf("dy is: %F\n",dy); printf("Total count is: %lld\n",count); dx=count*dx*dy; printf("Total area is: %F\n",dx); return(0); } //main
Three extra lines of code, to share the workload among however many CPU cores your system has (eight virtual cores, on a Core i7 CPU). Talk about a good return on your coding time!