We need synchronization mechanisms when two or more threads try to access a shared resource simultaneously. By using synchronization mechanisms kernel allows only one thread to execute while the other threads wait.
Concurrency in Uniprocessor systems
You may be thinking that how multiple threads may execute concurrently when there is only one CPU. In a single cpu system, although multiple threads can not run simultaneously but a thread may be preempted by an interrupt Or it may be descheduled to run another thread on the cpu. The preemption can lead to same problems of concurrency like data corruption, deadlock etc. Masking an interrupt is a simple technique used on single cpu systems to solve concurrency related problems. If a global variable is shared by both an interrupt handler and a kernel thread, that interrupt should be masked by the thread while reading/modifying the shared variable.
Concurrency on SMP systems
On an SMP system, two or more processors are connected to a single shared main memory. Multiple threads can run concurrently on different processors and can access the data on main memory simultaneously. Here, masking the interrupts on a single cpu or on all cpus does not solve the problem as we still have threads on other cpus running. We need more complex mechanisms here which we will discuss later.
Problems due to concurrent access of shared resource
It is very important to understand the problems or the undesirable effects due to concurrent access of a shared resource by multiple threads. Lets try to understand this using a simple example.
Suppose there is a global variable called “state” (8 bits wide). Consider two threads where thread-1 wants to set bit number one whereas thread-2 wants to set bit number two in the shared variable “state”. If any bit is already set, it should remain set.
To set a bit, a processor needs to do following three operations
1) Read the current value from memory in its internal register.
2) Set the bit in register.
3) Store the modified value of register to memory.
In fact, these are the common steps to perform any read-modify-write operations. Let the initial value of global variable “state” be 0xF0 (11110000b). The final value after both threads finished execution should be 0xF3 (11110011b). However this is not the case. It may happen that the final value of “state” is either 0xF1 or 0xF2 or 0xF3. This may happen when both the threads execute simultaneously. See the following example.
1) First thread-1 loads the value (0xF0) of global variable “state” to its register.
2) Then thread-2 loads the value (0xF0) of global variable “state” to its regsiter. (As only one processor can access the memory at a time, lets assume that thread-1 loads the value first and then the thread-2)
3) Thread-1 sets bit number 1 in its register (0xF0 -> 0xF1). Thread-2 also sets bit number 2 in its register (0xF0 -> 0xF2) (Both the processors can do this at the same time as they are using their internal registers.)
4) thread-1 stores the updated value (0xF1) in its register to memory.
5) Then thread-2 stores the updated value (0xF2) in its register to memory
(Again Lets assume thread-1 stores first)
So, the final value is 0xF2 whereas it should have been 0xF3. We lost the bit number 1 set by thread-1.
If thread-2 would have stored its value first, the final value will be 0xF1. In that case, the bit set by thread-2 would have been lost. So, concurrency access may lead to data corruption.
How to solve the above problem
The above problem can be solved if all the three operations are performed in single step (ie all other cpus are prevented from accessing the shared variable while one of the cpu is read-modify-writing to it.)
For this we need support from the processor hardware. Most of the modern day processors provide instructions to perform above operations atomically.
Lets take an example of 80386 processor
80386 provides an instruction “btsl” to set a bit in memory. The format is as follows: btsl memAddr, bitNum // This will set bitNum at memAddr
But there is no guarantee that the above operation is atomic. To solve this, 80386 provides a mechanism where the memory bus may be locked until read-modify-write instruction has finished its execution. This is done by prefixing the instruction opcode by “lock”. When the control unit detects the “lock” prefix, it locks the memory bus until that instruction is finished. No other processor can access the memory bus until the locked instruction has finished execution.
– lock; btsl memAddr, bitNum // Atomically sets biNum at memAddr
– Similarly, “lock” prefix may be used to atomically increment a number.
– lock; incl memAddr // Atomically increments the number stored at memAddr