cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sankalp kohli (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7278) NPE in
Date Thu, 26 Jun 2014 18:39:24 GMT


sankalp kohli commented on CASSANDRA-7278:

We are also hitting when there is lot of Java GC activity. I think this is caused by a race
caused due to a pause. 
Here are the steps which will cause this error
1) A thread A calls FileCacheService.get to get a RandomAccessReader from cache. It grabs
Queue<RandomAccessReader> from cache. 
2) There is a big GC pause > 512 milli which is expiring time of the cache
3) After the pause, cache detects the Queue<RandomAccessReader> has expired and removes
it and calls RemovalListener. 
4) RemovalListener iterates over the Queue<RandomAccessReader> and calls dellocate()
over all  RandomAccessReader which sets buffer to null. 
5) Thread A resumes, takes one RandomAccessReader from the Queue and proceeds. Since the buffer
is null, it will blow up in 

I know this is very unlikely but I could not think of anything else which will cause this.

If this is indeed happening, we can fix it by changing the for loop in RemovalListener to
a while loop like this
RandomAccessReader reader = null;
while ((reader = cachedInstances.poll()) != null)

If this does not work, I will upload the patch soon. 

> NPE in
> -----------------------------
>                 Key: CASSANDRA-7278
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 2.07, x86-64 ubuntu 12.04
>            Reporter: Duncan Sands
>            Assignee: sankalp kohli
>            Priority: Minor
>         Attachments: sl
> Got this this morning under heavy load:
> ERROR [ReadStage:128] 2014-05-21 07:59:03,274 (line 198) Exception
in thread Thread[ReadStage:128,5,main]
> java.lang.RuntimeException: java.lang.NullPointerException
>         at org.apache.cassandra.service.StorageProxy$
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>         at java.util.concurrent.ThreadPoolExecutor$
>         at
> Caused by: java.lang.NullPointerException
>         at
>         at
>         at org.apache.cassandra.service.FileCacheService.get(
>         at
>         at
>         at org.apache.cassandra.db.columniterator.SimpleSliceReader.<init>(
>         at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(
>         at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(
>         at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(
>         at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(
>         at org.apache.cassandra.db.CollationController.collectAllData(
>         at org.apache.cassandra.db.CollationController.getTopLevelColumns(
>         at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(
>         at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(
>         at org.apache.cassandra.db.Keyspace.getRow(
>         at org.apache.cassandra.db.SliceFromReadCommand.getRow(
>         at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(
>         at org.apache.cassandra.service.StorageProxy$
>         ... 3 more
> There had just been a 20 second GC pause, and the system was dropping messages like mad,
see attached log snippet.

This message was sent by Atlassian JIRA

View raw message