Parallel Processing


One of a series of tutorials about Scheme in general
and the Wraith Scheme interpreter in particular.

Copyright © 2011 Jay Reynolds Freeman, all rights reserved.
Personal Web Site: http://JayReynoldsFreeman.com
EMail: Jay_Reynolds_Freeman@mac.com.


Parallel processing is about cooperation. What a wonderful idea, to have two computers, or three computers, or even more, all working on a single problem! That way maybe we can get it solved faster. Or how about having two or more different computer programs working simultaneously on different aspects of a complex task -- the kind of things human beings do when one is folding form letters and putting them in envelopes while another writes out the addresses and sticks on the stamps. A group of Wraith Scheme processes -- a MomCat and her kittens -- running in parallel can do some of those kinds of things, though I must confess in advance that I have never had much luck getting a cat to lick stamps. I hear that dogs do a much better job at it.

Cooperation is wonderful, but all too often, it is hard to do it well. Everyone's human experience surely includes many instances of cooperative ventures gone awry because of poor communication, people getting in each others' way, not enough of something to go around, or folks getting possessive about their equipment. All of these things have analogies -- I almost said parallels -- in parallel processing done with computers. The situation is worse, too, because whereas at least some people have a little common sense, computers have none at all, and sometimes I think the same goes for computer programmers -- except, of course, for you and me.

There is a good deal more to parallel processing than I can tell you in one tutorial, or even in several of them. If you want to get seriously into the heavy side of the subject, you will have to seek out books and other sources. Yet there a lot of kinds of parallel processing that are not particularly heavy -- those you can do fairly straightforwardly. So this tutorial has several purposes:


One thing about parallel processing on a small computer may well confuse you when you think of it, so I had better mention it now. If you are like me, you probably have several applications running on your Macintosh at the same time -- all in parallel. You may well have more applications running than your Macintosh has processor cores to run them. How can that happen? How is it possible for a computer to run more applications than it has processors?

The answer has been around for a long time, since at least the 1960s, but if you haven't heard of it it may take a bit of explaining. What is going on is called multitasking or concurrent processing. The idea is that the different programs take turns at using the processing hardware. There are large numbers of turns per second, perhaps thousands or more. When each program gets its turn, it gets to do a little work, then it sort of goes to sleep until its next turn. The turns come often enough that you have the illusion that every program is right there, ready to respond, all the time. If you push a button in some program, that program will get its next turn before you know it, so there will be no perceptible delay between the time you press the button and the time something happens.

If your computer has more than one processor, then more than one program can get its turn at once. When two programs are really doing something at the same time -- each taking its turn on a different processor -- that is real parallel processing. When many programs are giving the illusion of doing something at the same time by taking alternate turns so fast that you don't notice, that is concurrent processing. That is, when it comes to parallel computing, "concurrent" is a word that means "faking it". Yet computers fake parallel processing very well by this means, so usually we use the term "parallel processing" for any combination of concurrent processing and real parallel processing.

One part of your computer software handles the "taking turns" business. It carefully puts to sleep programs that do not presently have active turns, so that their state is carefully preserved and will be the same the next time they wake up. It is the part of the operating system that is informally called the switcher. Most parallel programs rely on it to do its job. Wraith Scheme certainly does: Nothing in Wraith Scheme affects when or how often a particular Wraith Scheme process gets a turn to use the hardware. That is up to the switcher.

Some times it is kind of fun to look at multitasking from the point of view of the processor itself. What it is doing is kind of like what happens in some of those campy old plays, when one character is always climbing up the dumb-waiter and sliding down the laundry chute, while changing costume and putting on a wig, so as to give the illusion that there are really two characters doing two different things in two different places at the same time. Computer operating systems are really good with dumb-waiters and laundry chutes.


When it comes to what you can do with Wraith Scheme parallel processing, let's not forget something absurdly simple: One of the easiest ways to exploit Wraith Scheme's parallelism is simply to have two or more separate, independent Scheme programs running at the same time. That is not strictly parallel processing as I have defined it, because there is no intertask communication -- no cooperation -- but if that is what you need to do, Wraith Scheme can do it. What is more, not cooperating is a fairly promising mechanism for avoiding the pitfalls of inept cooperation.


I should specifically mention a kind of parallel processing that Wraith Scheme is not well suited to do; namely, the kind of processing that is variously called vector processing, array processing, data-parallel processing, SIMD processing ("SIMD" stands for "Single Instruction, Multiple Data"), or -- this next is a marketing term -- massively parallel processing. Whatever you call it, in that kind of processing the idea is that you have a vast number of the same kind of data elements, and you need to do the same thing to each one of them. In that case, if you also happen to have a vast number of computer processing units at hand, it might make good sense to parcel out the data elements so that each processing unit had just one, or maybe a small number of them. Then you could let all the processing units do the same calculation at the same time. It would be kind of like a military drill team, in which the sergeant says, "Okay, by the numbers! Everybody take the number in your shirt pocket, multiply it by three, add it to the number in your pants pocket, and pass the result to the soldier on your left."

That actually makes a lot of sense, and there have been vector processors and SIMD machines built and used for this sort of work for many decades -- I have helped to develop several of them during my professional career. That is also how computer graphics cards can display all the blood and gore of combat games in such overwhelming detail with such amazing speed.

It doesn't make sense for parallel Wraith Scheme, though, because Macintoshes are not vector processors. As I write these words, the most powerful Macintosh you can buy has twelve processor cores, and those cores are not arranged for easy synchronous operation -- in drill-team terms, the soldiers tend to get out of step. Furthermore, I have thus far managed not to succumb to the temptation to try to give Wraith Scheme access to all those parallel processor cores in your Macintosh's graphics hardware. That is probably just as well, because they would likely drag me away kicking and screaming if I tried: Remember, parallel processing is hard. Anyway, Wraith Scheme doesn't do vector processing.


What Wraith Scheme does do well is what is usually called task parallelism. The idea here is that the big problem, whatever it is, is composed of a number of small problems -- like the stuff-envelopes and lick-stamps situation. In that case, it might make sense to let each one of a small number of different computers run its own specialized program, to handle just one of the small problems. It is often easy to take a big problem apart into a small number of subproblems, so this kind of parallelism is reasonable for computers with a small number of processor cores.

In order to cooperate in task parallelism, the different programs handling the subtasks will have to communicate, at least implicitly. In the envelopes-and-stamps example, the person who is stuffing envelopes might speak to the person who is writing addresses and licking stamps, and say "Here's another envelope for you." Alternatively, if the workers were silent types, the envelope stuffer might just toss the stuffed envelopes into a big open box, so that the addresser-and-stamper could simply reach into the box for new envelopes to address and stamp. The addresser-and-stamper would get to take a short break any time the box happened to be empty.

Now you can see why it is important that parallel Wraith Scheme processes both share Scheme main memory and have an explicit way to send each other messages. Those features facilitate cooperation between task-parallel programs. Shared memory is where you set up the box for stuffed envelopes, and the messaging system provides a way for the workers to have a conversation. (Wraith Scheme also has an interrupt system whereby one worker can bang another on the head to get the other worker's attention, but head-banging is an advanced enhancement of Wraith Scheme, so I am not going to discuss it here. You can look it up in the Wraith Scheme Help File if you are interested.)


One of the simplest and most useful kinds of task-parallel cooperation that you can do with Wraith Scheme is debugging. If you don't know what one Wraith Scheme program is doing, perhaps you can examine some aspects of its state, in the shared Wraith Scheme main memory, from another Wraith Scheme process. For this method of debugging to work, the program being debugged must periodically write out enough of its state at top-level so that you can learn something by examining it -- you can only see top-level state from another Wraith Scheme process; there is no way to get at the local variables of procedure calls that are executing, or to see what instructions are being executed or are about to be executed. Many programs write out state of some sort periodically, or you can modify the program being debugged to do so.

This debugging technique compliments the rather more common one of scattering "display" or "write" statements throughout your code, to print out a long log output of what is going on. Logging output is useful when you have no idea when a problem occurs and have to keep track of things for a long time. Being able to inspect things at will is useful when there is some external symptom that a program is doing something weird, and when that condition lasts long enough for you to investigate.

You might also set up your program to change its behavior on the basis of some top-level setting, one that you can change from another Wraith Scheme process while the first program is running. Perhaps you can arrange that changing that setting makes your program pause for a while, or makes it start printing out a lot of information for you to look at.


There is a common style of writing programs in many computer languages, that is easy to implement using parallel processing in a way similar to the kind of debugging that I described. That style is what is called the model-view-controller design pattern. A program written this way has three large pieces: The view is the part that the user sees -- where the user tells the program what to do and sees the results. In Wraith Scheme itself, the view is the Wraith Scheme Window plus the menu bar at the top of the screen. The model is the part of the program that does the work: The term "model" does not always fit what the program is actually doing, but the idea is that many programs are doing something that models something in the real world outside the computer, such as managing a database or helping to run a business. The controller is a layer in between the view and the model, that passes information back and forth and perhaps changes the format of the information on the fly, to a form that the destination part is expecting.

Model-view-controller program development is well described in many places, both in books and on the Internet, so I won't go into it in more detail here. The point is, that with communicating parallel processes it is natural and straightforward to separate some of these pieces and have them run in different instances of Wraith Scheme. These days, most views use graphical user interfaces ("GUI"s), and Wraith Scheme doesn't do graphics, but a Wraith Scheme program that was acting as "view" might print out a list of options and wait for user input. When the user finally did get around to typing something, the view program could act temporarily as controller, and pass the information to a different Wraith Scheme program that was the model. That way the model could run continuously, without having to stop and wait indefinitely for the user to type something. You might use a third Wraith Scheme process just for printing output.

Incidentally, the reason why Wraith Scheme doesn't do graphics is that it would take an unbelievable amount of work to create and maintain a programming interface between Wraith Scheme and Apple's big library of special stuff for writing Macintosh applications. Even just reading the documentation for that library is pretty much an impossible task -- Apple publishes changes to the documentation for its library faster than I could read them, even if I read eight hours a day, five days a week. Hey, at least there is documentation. Yet if you would like to write Wraith Scheme programs with fancy graphical interfaces, there is a way. Look in the section of the Wraith Scheme Help File called "Foreign-Function Interface", and you will find a way for Wraith Scheme programs to interact with programs written in other languages, such as C++ or Objective C. You can easily call Apple's library routines from those other languages. I may have a better way for Wraith Scheme to cooperate with programs in other languages in some future release of Wraith Scheme.


The use of multiple Wraith Scheme processes in model-view-controller programming generalizes: A parallel Wraith Scheme program can have multiple windows for input and output, just as in the graphical user interfaces that are so widely used in computer applications today. You will probably find that possibility very liberating when you are designing large Scheme programs. Furthermore, if you look at the sections of the Wraith Scheme Help File called Sensory Input Devices and Sensory Output Devices, you will find that every Wraith Scheme process has a small handful of pushbuttons, sliders, toggle switches and level indicators, that your programs can use for any purpose you like.


To illustrate the kinds of problems you can have with more intricate parallel programs, let's go back to the example of human beings -- or perhaps cats -- stuffing and stamping envelopes. Let's make that problem a bit fancier by imagining that we have more than one person stuffing envelopes and sealing them, and more than one person writing addresses on the sealed envelopes and putting on the stamps.

Remember that these people are supposed to reflect the behavior of actual programs; that is, they are blind, stupid, and have no common sense whatsoever. You can imagine a variety of real-world problems that might occur in such a situation. For example, two people might get hold of the same envelope at once -- two of the addresser-stampers might happen to grab the same one -- in which case the envelope might get torn apart.

Or perhaps the box that holds the stuffed and sealed envelopes is out of convenient reach of everyone. Perhaps the addresser/stampers have to pick up the box and put it in their laps before trying to get at its contents. While that was happening, a stuffer/sealer might blindly toss a new sealed envelope at the place where the box was supposed to be, and have it fall on the floor where it would never get addressed and stamped.

The first two examples are instances of a race condition: A race condition is a situation in which the combined behavior of a group of parallel processes depends on the detailed timing of events, and there is no guarantee that the timing will always be the same. They are called "race" conditions because, after all, the outcome of a real race depends on the detailed timing of events: The winner is who gets to the finish line first. There is another common kind of problem, though. We will consider it in the next example.

Let's add to our task the notion of resources being required to perform a task. Suppose that two resources are required for addressing an envelope; namely, a lap desk (so that there will be a flat surface to write on) and a pen. Suppose furthermore that there is only one lap desk and one pen in the room: The addresser/stampers are supposed to take turns using the resources, and put them back where anyone can get them when they are done with them. The problem now is that perhaps one addresser/stamper has an envelope to address, and has grabbed the lap desk, but hasn't gotten the pen yet, while a different addresser/stamper has the pen but needs the lap desk. Without common sense -- and remember, these people are supposed to be just as idiotic as computer programs -- the two addresser/stampers might very well get obstinate and each wait forever for the resource the other has. In that case, no envelopes are ever going to get addressed again. The system is said to be in deadlock, or just deadlocked. A deadlock is a situation in which a group of parallel processes can no longer make progress because some processes are holding resources that other processes need, and there is a circle of dependencies such that none of the processes will ever give their resources up.

What's that business about "circle of dependencies"? It has to do with there being what is called a cycle in a directed graph, both of which terms are a bit beyond the intent of these tutorials, but here is a simple example: Process A needs something that process B has, and process B needs something that process C has, but process C needs something that process A has. Here we have a circle of three processors, each holding a resource and waiting impatiently for the process ahead of it to give up some other resource that it needs. If you draw that out on a piece of paper with arrows connecting the letters "A", "B" and "C", you may get an idea of what a cycle in a directed graph is all about.


Race conditions are sometimes pretty subtle, in the sense of being rare or hard to detect, or both. You could be processing envelopes for a long time before some addresser/stamper happened to pull the box away at the very instant when a stuffer/sealer threw a new envelope at it, and it might be a long time after that before someone noticed an unprocessed envelope lying on the floor. It might not be immediately obvious exactly what had gone wrong, and when.

Deadlocks may or may not be rare, but at least they are usually obvious when they do happen. The system stops working -- no more envelopes ready for the mail are being prepared. It may take some investigation to discover the details of the problem, but at least you know that something is wrong.


Now let's show how these problems can happen in computer programs instead of in rooms full of administrative assistants. Take that box of envelopes, for instance. It provides a way for two kinds of processes to pass data from one to another, and in a Scheme program we might use a list that was bound to a variable at top level for that purpose. If we carry over the language of the envelope-stuffing problem into Scheme procedures, then we might have a parallel system of programs with a top-level variable called "envelope-box", that started out empty.

(define envelope-box '())

When a stuff-and-seal process had a new envelope to put in the box, it might execute code like this.

(set! envelope-box (cons new-envelope envelope-box))

To get a new envelope to work with, an address-and-stamp process might do something like this.

(if (not (null? envelope-box)) (let* ((current-box-contents envelope-box) (my-envelope (car current-box-contents)) (new-box-contents (cdr current-box-contents))) (set! envelope-box new-box-contents) my-envelope ;; Returned from the procedure where this code occurs ))

Note that there are a lot of steps in both of these code fragments. Even as simple a statement as "(set! envelope-box (cons new-envelope envelope-box))" involves looking up many variable names, consing a new element onto the present value of "envelope-box", and changing the value bound to "envelope-box" so that it is the new list.

Now for a race condition. Suppose that an address-and-stamp process starts to get a new envelope, and makes it part way into the second code fragment. Perhaps it has just gotten past the part of the "let*" where it has bound the present value of "envelope-box" to local variable "current-box-contents". Suppose for purposes of discussion that "envelope-box" had a value of (envelope-3 envelope-2 envelope-1), so that the address-and-stamp process now has that same value for "current-box-contents".

Suppose that at that exact instant, a stuff-and-seal process runs its entire code fragment, and adds envelope-4 to "envelope-box". That is, momentarily, "envelope-box" has as its value (envelope-4 envelope-3 envelope-2 envelope-1).

Then the address-and-stamp process continues its code fragment. It has "current-contents" set at (envelope-3 envelope-2 envelope-1), so it peels "envelope-3" off the front of the list, as the next envelope to be addressed and stamped. A few lines later it set!s the value of "envelope-box" to (envelope-2 envelope-1). That instance of "set!" overwrites the new value that the stuff-and-seal process had just put there, and in consequence, "envelope-4" has disappeared from the list without ever getting processed! It has "fallen on the floor", all because of the conflicting simultaneous use of the box by two processes. We have a program bug caused by a race condition.

One way to avoid this kind of race condition is to have a lock to restrict access to the box to one process at a time. The idea here is as if the physical envelope box were in a cabinet that had a key lock for the door. It is a property of key locks that only one person can put a key into one at once: If I have my key in the lock and you try to put yours in, it won't fit. So everyone who wants to use the box has to wait till they can succeed in putting their key into the lock. When they have done so they unlock the cabinet, do whatever they want to the box inside, then close and lock the cabinet, and finally remove the key. Getting access to the box is a critical section, in that -- as we have just seen -- it is critical to restrict access to only one user at a time.

Computer programs running in parallel may have to deal with critical sections as well. You can see that we might need to consider many kinds of access to the "envelope-box" list a critical section. Computer programs also use locks of various kinds, though most of them have no analogy to a physical key; instead, they are special computer operations whose nature is such that only one process can get at them at any one time. Wraith Scheme has as extensions several operations that Wraith Scheme programs can use to establish such locks, but it will take more than a simple tutorial to teach you how to use them correctly. You can find out how the operations work by looking them up in the Wraith Scheme dictionary; their names are c::block, c::release-block, e::block-any-binding, e::block-symbol-binding, and c::set-blocked-binding.

Note in passing that although I have not said so, "envelope-box" is a resource, and any kind of locking mechanism for it establishes a way for one process to grab hold of it. Therefore, in principle, "envelope-box" might somehow end up as one of a group of grabbed resources that is causing a deadlock. Whether or not that can happen depends on precisely how we write the code for our parallel processing. There is a general principle here, though: Race conditions and deadlocks are often opposite sides of the same coin.. We create locking systems to prevent certain kinds of race conditions when processes are competing to get resources, and once we have introduced lockable resources, that only one process can use at a time, we have also introduced the possibility of deadlock.

In the example of deadlock involving envelope-processing by human beings, I did not explicitly mention that locking mechanisms and critical sections were involved in getting access to the lap desk and to the pen, but you can see that they are in fact necessary: They were set up because we didn't want two office workers getting into a tug of war over the pen or the lap desk, and maybe pulling it apart. Thus we concluded that both use of the pen and use of the lap desk were critical sections, and set up some kind of mechanism to make sure that only one worker at a time could have access to each of these resources.

Some times deadlocks can easily be avoided, if you think to do so. The simplest way is to write code in which no process ever needs more than one resource at a time. What makes this idea more than trivial is the possibility of bundling resources: In the pen and lap desk example, imagine that the pen were tied to the lap desk; we could then just tell the office workers that when they needed to address an envelope all they had to get hold of was just one thing -- the combination of pen and lap desk. A worker either has the whole thing or not; there is no longer any possibility of one person having the pen and another having the lap desk.

The second easy way to avoid deadlock is to ensure that all workers who need the same multiple resources ask for them in the same order. We could tell our envelope-addressers, "First, get the lap desk. Then, after you have the lap desk in hand, get the pen. Then write the address and put both resources back where anyone can get them (that is, release the locks)." Do you see why this works? Think about it.


This has been a long tutorial, and I suspect you can see that we have only just begun to discuss the complexities of parallel programming. What I suggest you take away from the discussion here is the following:

Good luck, and don't forget to reread this tutorial when you are ready to tackle some serious parallel coding.


-- Jay Reynolds Freeman (Jay_Reynolds_Freeman@mac.com)


Wraith Face