Following is the snippet of benign (at least by appearance) code that had been cause one of the bugs we faced recently.
The class schedules the given jobs on a timer. The start time for the jobs is around 10 hours from the point the scheduler Start is executed. The timer fires every 24 hrs periodically.
As unexpected, the timer never fired, not even once in production.
We did not face any issue related to this code in development environment, the only difference was that the delay time and period for the timer were shorter in the unit test.
Key suspect in such code behavior is GC. GC has high probability of execution in a long running process than a short running unit test.
The issue was indeed GC, the timer was being garbage collected hence the timer callback was not being executed.
Here is the detailed analysis of the code.
- The Loc 1 in the code is a Select expression which is implemented by deferred execution.
- The Loc 2 in the code exists to ensure the Select is executed and timer is created/scheduled.
- The key issue with the code is no reference to the Timer object created is retained. Hence, the Timer object is garbage collected.
- Even though the Loc 1 seems to indicate that the reference to the Timer is assigned to the _timers field, it's not so. The _timers field holds the reference to the Select iterator. In other words, the lambda code itself.
GetType() on _timers will return System.Linq.Enumerable+WhereSelectListIterator`2 [System.Threading.TimerCallback,System.Threading.Timer]
- If the timer is garbage collected, what is the observed behavior in the Dispose method of the JobScheduler ?
The foreach loop actually creates new timers as a result of Select execution again and disposes them.
So the current situation with the code is
- Timer will not be fired if the GC runs.
- Timer created will never be stopped by Dispose if the GC does not collect it.
This is the Gist of the the complete program to observe the issue/code behavior.