Concurrency parallel Many computation units in one computer concurrent Things happen at the same time Benefits of concurrency: Speedup of computation (parallel operation sequences) expressiveness (concurrent nature of a problem) Concurrency levels (granularity) Instruction level Statement level Process level (independent subroutines) Application level Different levels appear differently in the language Virtual and actual concurrency Logical units (processes) Physical units (processors) 1
1 Process and thread Different language, different terminology Process Operating system provided tool for executing programs concurrently Separate memory access Communication via message passing, provided by OS Thread light process Sequence of statements that is executed independently of other threads Each thread has a separate stack and program counter Inside a process, threads still have a shared memory Communication happens using shared memory, synchronization a major challenge Communicating processes may be executed in different machines, and even subroutines (RPC, Remote Procedure Call) 2
2 Synchronization communication: Message pass. Shared mem. Synchronization: implicit specialoperations Co-operative synchronization process A waits until process B finishes before execution may continue e.g. producer/consumer problem Competetive synchronization condition synchr. Processes need the same resource, but only one process may have access e.g. Shared memory write mutual exclusion 3
Coroutines Oldest concurrency structure in actual PLs Simula67, Modula-2 Quasi-concurrency One processor Processor switches from one process to the next explicitly Scheduling left for the programmer Becoming popular again (Python generators etc.) 4
MODULE Program; FROM SYSTEM IMPORT PROCESS, NEWPROCESS, TRANSFER,..; VAR v1, v2, main: PROCESS; PROCEDURE P1; PROCEDURE P2; BEGIN BEGIN...... TRANSFER ( v1, v2 ); TRANSFER ( v2, v1 );...... TRANSFER ( v1, v2 ); TRANSFER ( v2, v1 );...... TRANSFER ( v1, v2 ); TRANSFER ( v2, main ); END P1; END P2; BEGIN NEWPROCESS ( P1,..., v1 ); NEWPROCESS ( P2,..., v2 ); TRANSFER ( main, v1 ); END; Co-routines Modula-2 5
Algol68 PL/I Semafores Syncing based on scheduling Method of mutual exclusion Integers with operations P (wait) and V (signal): P ( S ): if S > 0 then S := S 1 else process waits for S V ( S ): if some process waits for S then let one continue else S := S + 1 P and V atomic General or binary semafore 6
Monitors Skedulointiin perustuva synkronointi More advanced guardians of shared information Make use of modular structure Encapsulation / information hiding Abstract data types Protection mechanism Operation exclusion Only one process may execute operations in a module, a process holds the lock to the monitor Monitor waitset A set of processes that wait acces to operations 7
Java (threads) Enables concurrency/parallelism Inherit from Thread class Implements Runnable Main program always creats a thread Thread execution run-operaation defines operationality Execution begins upon start-operation, when system calls the run-operaation Other Thread-class operations sleep: locks the thread (milliseconds) yield: thread gives up its execution time 8
Java (synchrony) Every object has a lock Prevents synchronized operations from being executed at the same time When a thread calls a sync. operation of an object Thread takes control of a lock Other threads cannot execute any synced operations of the object Thread releases lock when Operation is finished or thread waits synchronized void f ( ) {... } void f ( ) { synchronized ( this ) {... } } 9
Creation of threads One language may have several ways for starting threads co-begin parallel loops launch-at-elaboration fork-join implicit receipt early-reply 10
Cobegin nondeterministic: begin a := 3, b := 4 end parallel: par begin a := 3, b := 4 end Algol68-code Possible also in Occam par begin p ( a, b, c ), begin d := q ( e, f ); r ( d, g, h ) end, s ( i, j ) end p ( a, b, c ) d := q ( e, f ) s ( i, j ) r ( d, g, h ) 11
Parallel loops: Launch at elaboration: SR: Occam: co ( i := 5 to 10 ) -> p ( a, b, i ) oc par i = 5 for 6 p ( a, b, i ) Ada: procedure P is task T is... end T; begin P... end P; Fortran95: forall ( i = 1 : n 1 ) A ( i ) = B ( i ) + C ( i ) A ( i + 1 ) = A ( i ) + A ( i + 1 ) end forall 12
Previous alternatives Fork/Join fork...... join 13
Fork/join Ada: task type T is... begin... end T; pt: access T := new T; Java: class mythread extends Thread {... public void mythread (... ) {... } public void run ( ) {... } }... mythread t = new mythread (... ); Modula-3: t.start ( ); t := Fork ( c );... Join ( t ); t.join ( ); 14
GPU-computation GPU has significant computation power GPU architecture is very different from CPU SIMD etc, a lot vector computation All computers do not have GPUs, and programs should still function A common programming language for CPU and GPU Example, Nvidia CUDA, based on C 15
CUDA 16
SIMD-computation Single Instruction, Multiple Data Parallel computations do the same operations for different data items Typical in scientific computing with matrices, vectors etc. Apparent in the language loop structures etc. Processor machine code may support vectorcomputation (SISD=Single Instruction, Single Data) (MIMD=Multiple Instruction, Multiple Data) (SPMD=Single Program, Multiple Data) 17
Active objects In object oriented languages, object is a natural unit Active object has its own execution thread that executes any methods called No wait for finishing, return values as futures No problems with member variable mutex What if a method execution is left waiting? (Symbian act. Object executes run-method, other calls as norma) 18
Futures Upon parallel call, how to pass a value to the caller that does not wait? Futures represent a value that can be received later If the value of the future is read/used, the reader will have to wait Future must report if the result is available future<int> tulos = async(f, x); // C++0x //... if (tulos.get() == 3)... // Odottaa tarvittaessa 19
Thread pool Some times concurrency is needed to execute large number of small tasks Even launching a thread is too expensive (overhead) Reasonable amount of concurrency is the number of processors Tasks are added into a thread pool that executes them in paralllel to a reasonable extent Task start may be conditional Functions, lambda expressions etc. 20
Functionality Threads share a memory, read/write must be protected Cache synchronisation Functional languages with no side effects do not suffer from this! Laziness and concurrency have a lot in common Concurrent Haskell still has threads etc. 21
Transactional memory Used, e.g., in Concurrent Haskell Code block atomicity (transaction). All changes become visible in one go One implementation: Every read/write is logged, the variables remain intact At the end of the block, check that values are as they are supposed to be If not, something has interfered -> destroy log and try again 22
Ada83 Ada tasks Like packages (syntax-wise) Definition and body Definition has no private part Active unit (unlike package) Task interaction Rendezvous using entry An entry defines task provided services Server/client Synchronic message passing Passive waiting 23
T1 T2 Entry and rendezvous Entry call in out in out entry P Rendezvous is a statement even though it resembles a procedure Asymmetricity in rendezvous Calling process must know the entry Task with the entry does not know the caller Task (entry) call creates a rendezvous Caller waits for a specific task Entry holder waits for any caller Normal parameter passing (in, out, in out) 24
select accept P1... end P1;... stmts... or accept P2... end P2;... stmts... or... end select; Entry holder Lack of clients at an entry must not prevent service at other entries Choose an entry with clients waiting Selective rendezvous (select-stmt) Entry caller select P (...);... stmts... or... stmts... end select; Waiting for a specific service may not be reasobable If entry holder is not ready, do something else 25