One line review: This is one of the best books on concurrency in Java and concurrency in general and a must-read.

I’ve delayed this review for a veerrryy long time, but this book is such a great resource for concurrency and needs to be discovered more. Technical books are often labelled as difficult to read, but I believe that is just because of bad writing. This book explains concurrency concepts extremely well but also does great on establishing continuity from one chapter to another. It does a better job on defining thread safety, safe initialization and publication than anything I could find on the internet. I won’t try to explain what the book covers in detail but I’ll give an overview of thread safety and memory models (inspired from the book), Java’s approach to it and where concurrency in Java is headed.

Thread safety

Correctness

What does it mean for a program to be unsafe? We could say that a program is unsafe when it does not behave “correctly”. Any reasonable discussion of thread safety must start with describing correctness.

Consider a class called SequenceGenerator that generates a unique sequence number. It has a variable called value that stores the next number in the sequence. The class would look as follows:

public class SequenceGenerator {
  private int value = 0;

  public int getNext() {
    return value++;
  }
}

We have an idea about how this class should behave. It should generate a unique number starting from 0 and no two numbers should be equal.

Each instance of a class in Java contains some state and methods that modify that state. In the above class the state comprises of the variable value and the getNext method modifies the state.

We can specify the behaviour of any class using invariants constraining the object’s state and postconditions describing the effect of its operations. Correctness now means that a class conforms to its specification.

Consider the above class. If the state of value is x and the getNext method is invoked, then the state of value after getNext completes must be x+1. Each invocation of getNext should increment the value by 1. These invariants and postconditions might not hold when multiple threads are executing that method.

Things are not what they seem

The increment operation is not a single instruction but a combination of reading the value from memory, incrementing it and writing it back to memory.

So when we execute value++, the CPU first reads the value, increments it and writes it back.

mov rax, [addr of value]
add rax, 1
mov [addr of value], rax

The compiler can interleave operations from one thread with another thread for optimization and this does not break the semantics of the program yet. Consider the following sequence of operations when two threads T1 and T2 are trying to increment value and let its current state be 3.

T1 reads value from memory to register (reads 3)
T2 reads value from memory to register (reads 3)
T1 increments value (4 in register)
T2 increments value (4 in register)
T1 writes value from register to memory (writes 4)
T2 writes value from register to memory (writes 4)

Now before the execution of getNext, both T1 and T2 saw value to be 3. getNext returns 3 to both the invocations (not a unique number) and the state of value after two invocations of getNext is still 4. This clearly violates our specification of class SequenceGenerator.

It is the job of the developer to ensure that such instruction reordering does not break the class specification. Suppose we allow exactly one thread to execute getNext at a time. Since all the instructions now execute serially, the sequence of read-modify-write is executed as one operation. This can be achieved using the synchronized keyword on the method, that allows only one thread to execute that method on an object. If synchronized is used, the program and sequence of execution will look as follows

public class SequenceGenerator {
  private int value = 0;

  synchronized public int getNext() {
    return value++;
  }
}
T1 reads value from memory to register (reads 3)
T1 increments value (4 in register)
T1 writes value from register to memory (writes 4)
T2 reads value from memory to register (reads 4)
T2 increments value (5 in register)
T2 writes value from register to memory (writes 5)

We cannot make guarantees about whether T1 or T2 executes first but the read-modify-write operations are executed as one operation. Each invocation of getNext generates a unique sequence number.

Note that the interleaving sequence of operations is still possible on a single processor system without synchronization due to thread scheduling. If T1 is switched out after it increments the value in the register but does not write it back, and T2 is then scheduled to run, we will observe the same behaviour. Synchronizing access to the variable is necessary for correctness.

The synchronized keyword ensures that the code in the method is mutually exclusive. If T1 is executing the method, it holds the lock on the object and T2 cannot start until T1 releases the lock, thus ensuring that the increment happens atomically.

Java Memory Model and visibility

The primary objective of using threads is to ensure resource utilization. Each thread can run independently on its own processor. Each processor has its own registers and cache. There are a lot of optimizations that the compiler and hardware do, especially instruction reordering to ensure that programs run faster and such that the “semantics” of the program are still preserved. For example, consider the following code

void canReorder() {
  int x, y;
  x = 1;
  y = 0;
}

The compiler is free to reorder the y = 0 statement before the x = 1 statement as this does not change the behaviour of the method.

In the above code, when the variable x has its value changed to 1 by one thread, when does another thread see it?

This might seem like a weird question at first. Ideally, we want any other thread to see a new value for the variable as soon as it has been written to. But this requires the registers, cache and the main memory to be in sync at all times. This defeats the purpose of caching.

This can have interesting effects on program behaviour. Consider the following program:

class DataRace {

  boolean ready = false;
  int answer = 0;

  void method1() {
    while(!ready);
    assert answer = 42;
  }

  void method2() {
    answer = 42;
    ready = true;
  }
}

Suppose there are two threads T1 and T2 which are executing method1 and method2 respectively. We want the behaviour to be as follows: T1 should loop as long as ready is false. Once the value of ready has been set to true by T2, T1 should stop spinning and will execute the assert statement.

Interestingly, T1 can loop forever even after T2 has executed method2.

The compiler sees that ready is never written to in the body of method1. The compiler can optimize the loop statement to while(true) effectively making T1 loop forever.

Consider another scenario, suppose T1 executes first and the processor stores the value of ready in its register. Now T2 starts executing on a different processor and changes the state of ready. The processor executing T1 might always read the value of ready from its register and never see the new value, thus making T1 loop forever.

Why does this happen and how do we solve it?

Caching and instruction reordering by the compiler and hardware are responsible for many of the performance improvements of programs. We would not want to take away these optimizations, but we also want to give the programmer power to prevent such inconsistencies.

Different hardware architectures have different memory models. A memory model tells programmers what guarantees they can expect from the memory system. Some architectures allow certain kinds of instruction reorderings which others do not. But each architecture provides special instructions (called memory barriers) to get memory coordination guarantees. A great feature of the Java language is that we do not have to explicitly insert memory barriers for synchronization. Yay!

The Java Language Specification requires the JVM to maintain within-thread as-if-serial semantics. As long as the effect of execution of the program is the same as in a strictly sequential environment, all the above optimizations are permissible. So how do we ensure that actions of thread A are visible to thread B? On a DataRace object, we would want T1 to see the value of ready when T2 executes method2.

The Java Memory Model is defined in terms of actions. In order for thread A to see the actions of thread B, there must be a happens-before relationship between A and B. In the absence of this relationship, the JVM is free to reorder instructions as it pleases.

There are many rules but I’m going to state some important ones, some of which may seem “obvious”.

  • Program order rule: Each action in a thread happens-before every action in that thread that comes later in the program order.
  • Thread start rule: A call to Thread.start happens-before every action in the started thread.
  • Thread termination rule: Any action in a thread happens-before any other thread has detected that the thread has terminated.
  • Volatile variable rule: A write to a volatile field happens-before every subsequent read of that same field.

The keyword volatile on a variable declaration thus has special meaning. Due to the happens-before relationship, once a volatile variable has been written to, all subsequent reads of that variable will see the effect of the write. This allows us to propagate the effects of writes between threads. Declaring ready to be volatile ensures that T1 does not loop forever.

class DataRace {

  volatile boolean ready = false;
  int answer = 0;

  void method1() {
    while(!ready);
    assert answer = 42;
  }

  void method2() {
    answer = 42;
    ready = true;
  }
}

Once T2 writes to ready, a subsequent read by T1 must see the effect of write no matter what the compiler and hardware do.

Thus, any data that is shared between multiple threads and whose state is changed must be synchronized.

Where to go after reading this book

This book does a tremendous job on the topics it covers from interruption, thread pools, safe publication of objects, avoiding deadlocks, thinking about performance of concurrent programs, to more advanced topics like building custom synchronizers and non-blocking synchronization.

I think this book should be everyone’s introduction to concurrency in Java. But the latest edition of this book dates back to 2006 and Java has introduced a ton of new features since then. I would suggest the following topics after reading this book.

Fork/Join framework

According to Oracle docs, “the fork/join framework was primarly designed for work that can be broken down into smaller pieces recursively”. If a piece of work is demeed small enough to be computed in the current thread, we do that work. Otherwise the work is split and new threads are created for executing that work in parallel. Finally, we join the results. A typical use of fork/join framework is in the following case

if (my portion of the work is small enough)
  do the work directly
else
  split my work into two pieces
  invoke the two pieces and wait for the results

The above scenario looks like a “divide and conquer” approach and fork/join was primarily designed for this.

Micro-benchmarking and JMH

I was intrigued about the performance of blocking (use locks) and non-blocking (no locks) data structures and wanted to test them myself. I wanted to compare the performance of a queue guarded by a single lock vs a ConcurrentLinkedQueue. I wrote a simple program where multiple threads write to the queue numerous times (x threads each writing y times to the first queue and x new threads writing y times to the second queue). I used the time of execution as a metric for comparing the performance.

I noticed that the time of execution was lower for whichever queue appeared later in the program. It turns out that there are a ton of reasons why testing the performance of those queues in the above way was a bad idea.

  • The JVM optimizes code as it runs (I know, mind-blowing). A single execution of the program doesn’t capture true performance.
  • A timed measurement of a block of code does not isolate the code under test. We cannot know whether the time was spent in JVM housekeeping, thread scheduling, garbage collection i.e. we measure more than just queue performance.

Micro-benchmarking and frameworks like JMH can be used to eliminate such effects. However, be aware of the pitfalls of micro-benchmarking. It has been spoken about a million times but the takeaway is that micro-benchmarks in most cases don’t reflect real system runs.

Virtual threads

Virtual threads promise to “reduce the effort of writing, maintaining and debugging high-throughput concurrent applications”. Virtual threads are primarly helpful in cases where there are a large number of concurrent tasks and each of them spends time waiting for some action.

A platform thread is a simple wrapper around an OS thread. There are a limited number of platform threads that we can create. Suspending a platform thread also suspends the OS thread and the OS has to make scheduling decisions and execute another platform thread.

A virtual thread isn’t tied to an OS thread. Any virtual thread can run on any available OS thread. Suspending a virtual thread does not suspend the OS thread and the OS is free to run another virtual thread.

Most importantly, virtual threads don’t make the program run any faster. They exist to improve throughput but not latency.

Resources

Jenkov - Java Concurrency - Can serve as a quick introduction to Java concurrency

Myths Programmers Believe about CPU Caches - Tackles a common misconception that volatile variables are read from/written to main memory each time which would be tremendously slow

JSR 133 (Java Memory Model) FAQ - Answers common questions about the JMM and is written by the author Brian Goetz

The Java Memory Model explained by Rafael Winterhalter - Great introduction to the JMM and an alternative to people who prefer video explanations

JMH repository

Avoid Benchmarking Pitfalls on the JVM

Virtual Threads, Structured Concurrency and others