Tuesday, September 01, 2015

A case for Bloom Filter

We provide cache as service to our users. Users can put elements into cache with a TTL. We needed to determine the performance of the cache in terms of the hit rate.

The hit rate can be low because of two factors - LRU in the cache and TTL of the element put by user. We needed to ensure that the hit rate is not low because of LRU - in that case we were ready to allocate more memory for the cache.

One of the ways to determine if the TTL caused removal of the element is to track all the keys in the the cache even if the elements were removed. 

Bloom filter is a good data structure to track the keys. We keep adding keys to the bloom filter. When a cache miss occurs, we check the bloom filter if the key was ever present. If it was, then the cache miss was caused by TTL. If it was not present then it was a real cache miss. 

Bloom filter will give an affirmative answer if an element was not present. Its probabilistic answer if the element was present - in our case this is acceptable since we will count additional cache miss due to TTL. It is acceptable because it is easy to tune TTL than cache.

Also we are adding elements for tracking and no removals - which also matches with the requirement of the bloom filter.

In summary, Bloom filter worked well for our case compared to other alternate set data structures. We were able to determine presence of an element with minimal possible bits and tune our cache performance.

Sunday, August 09, 2015

Port Unification with Netty in Proxy Mode

Serving requests - both HTTP and HTTPS on the same port had been a requirement in my scenario.This is called port unification.  I was using Netty to serve requests. One of the standard ways to handle the scenario is to use the approach described in PortUnificationHandler.

However, in my case, the Netty implementation was mimicking the behaviour of a proxy server. This means that a client will make a CONNECT request before any actual data request is sent. CONNECT request is always non-SSL - even if the protocol to follow is HTTPS.

The approach to use in this scenario is as follows:


Implement a ChannelInboundByteHandlerAdapter;  on inbound buffer update - check if the request is CONNECT request. In case of CONNECT request, pass on the incoming bytes to the next handler in the chain.

If its not a CONNECT request, do the SSL data encryption check and add the SSL handler to the chain if necessary. Before passing on the bytes to the next handler, the unification handler should remove itself from the chain.

Monday, July 20, 2015

Threading Race Deque

Following code has been causing problems at a various times and has sent me debugging in a lot of unrelated areas.

private final NavigableSet scenarios = new ConcurrentSkipListSet()

public void fireCompletedScenarios() {
        while (!scenarios.isEmpty()) {
            InMemoryScenarioRecord firstScenario = scenarios.first();
            if (firstScenario.completed()) {
                synchronized (completedScenarios) {
                    completedScenarios.offer(scenarios.pollFirst());
                }
                continue;
            }
            //exit when you find first uncompleted scenario
            break;
        }
    }

There is a navigable set of scenarios, ordered by scenario time. We pull out completed scenarios from it and add it to another queue. We break when we find the first incomplete scenario.

The issue I have been facing is - incomplete scenarios getting into the queue of completed scenarios. As usual, the case does not happen in local environment - happens in non-prod environments -under high load. Hence, it must be some race condition causing the issue.

Finally it stuck to me is the bug in the above code that manifests under high load. The bug is as follows:

The scenarios.first() just gives me access to the first scenario and does not remove it from the navigable set. After the completion check, pollFirst is called to remove it from the set.

This is the bug - under high load - the element returned by first and the element removed by pollFirst are not same!

There can be another incomplete scenario that came in to the set; to the top because it had lower scenario time; but arrived late into the set because of the asynchronicity that existed in this system. This resulted in adding a incomplete scenario returned by pollFirst to the completed queue!

huh! I fixed this over sight and learnt a valuable lesson !