Monday, July 20, 2015

Threading Race Deque

Following code has been causing problems at a various times and has sent me debugging in a lot of unrelated areas.

private final NavigableSet scenarios = new ConcurrentSkipListSet()

public void fireCompletedScenarios() {
        while (!scenarios.isEmpty()) {
            InMemoryScenarioRecord firstScenario = scenarios.first();
            if (firstScenario.completed()) {
                synchronized (completedScenarios) {
                    completedScenarios.offer(scenarios.pollFirst());
                }
                continue;
            }
            //exit when you find first uncompleted scenario
            break;
        }
    }

There is a navigable set of scenarios, ordered by scenario time. We pull out completed scenarios from it and add it to another queue. We break when we find the first incomplete scenario.

The issue I have been facing is - incomplete scenarios getting into the queue of completed scenarios. As usual, the case does not happen in local environment - happens in non-prod environments -under high load. Hence, it must be some race condition causing the issue.

Finally it stuck to me is the bug in the above code that manifests under high load. The bug is as follows:

The scenarios.first() just gives me access to the first scenario and does not remove it from the navigable set. After the completion check, pollFirst is called to remove it from the set.

This is the bug - under high load - the element returned by first and the element removed by pollFirst are not same!

There can be another incomplete scenario that came in to the set; to the top because it had lower scenario time; but arrived late into the set because of the asynchronicity that existed in this system. This resulted in adding a incomplete scenario returned by pollFirst to the completed queue!

huh! I fixed this over sight and learnt a valuable lesson !