cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Harvey (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-4492) Large HintsColumnFamily compactions hang
Date Sun, 05 Aug 2012 04:57:02 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Harvey updated CASSANDRA-4492:
------------------------------------

    Description: 
Running into an issue on a 6 node ring running 1.0.11 where whenever a somewhat large set
of hints build up (seen as low as 400MB), compaction on the hints CF hangs indefinitely. Nothing
of note in the logs. In some cases, the compaction hangs before a tmp sstable is even created.

I've wiped out every hints sstable I have and restarted several times. The issue always comes
back rather quickly and predictably after wiping the sstables. Compaction always seems to
succeed if the hints CFs are rather small.

Hints are enabled, and my hint window is the default of 1hr. I do have some copies of HintsColumnFamily
sstables that do replicate this issue. However, the hints may contain confidential data. If
they'd be helpful in troubleshooting this issue, let me know and I can see about sending them
directly.

This ring was upgraded from 1.0.7. I didn't keep any hints from the upgrade.

Here is the output I see from compactionstats where a compaction has hung. The 'bytes compacted'
column never changes.

{code}
pending tasks: 1
          compaction type        keyspace   column family bytes compacted     bytes total
 progress
               Compaction          systemHintsColumnFamily          268082       464784758
    0.06%
{code}


The hung thread stack is as follows: (full jstack attached, as well)

{code}
"CompactionExecutor:37" daemon prio=10 tid=0x00000000063df800 nid=0x49d9 waiting on condition
[0x00007eb8c6ffa000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000050f2e0e58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.computeNext(ParallelCompactionIterable.java:329)
        at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.computeNext(ParallelCompactionIterable.java:281)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:147)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:126)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:100)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext(ParallelCompactionIterable.java:101)
        at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext(ParallelCompactionIterable.java:88)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:141)
        at org.apache.cassandra.db.compaction.CompactionManager$7.call(CompactionManager.java:395)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{code}

  was:
Running into an issue on a 6 node ring running 1.0.11 where whenever a somewhat large set
of hints build up (seen as low as 400MB), compaction on the hints CF hangs indefinitely. Nothing
of note in the logs. In some cases, the compaction hangs before a tmp sstable is even created.

I've wiped out every hints sstable I have and restarted several times. The issue always comes
back rather quickly and predictably after wiping the sstables. Compaction always seems to
succeed if the hints CFs are rather small.

Hints are enabled, and my hint window is the default of 1hr. I do have some copies of HintsColumnFamily
sstables that do replicate this issue. However, the hints may contain confidential data. If
they'd be helpful in troubleshooting this issue, let me know and I can see about sending them
directly.

This ring was upgraded from 1.0.7. I didn't keep any hints from the upgrade.

Here is the output I see from compactionstats where a compaction has hung. The 'bytes compacted'
column never changes.

{code}
pending tasks: 1
          compaction type        keyspace   column family bytes compacted     bytes total
 progress
               Compaction          systemHintsColumnFamily          268082       464784758
    0.06%
{code}

    
> Large HintsColumnFamily compactions hang
> ----------------------------------------
>
>                 Key: CASSANDRA-4492
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4492
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.0.11
>            Reporter: Jason Harvey
>            Priority: Minor
>
> Running into an issue on a 6 node ring running 1.0.11 where whenever a somewhat large
set of hints build up (seen as low as 400MB), compaction on the hints CF hangs indefinitely.
Nothing of note in the logs. In some cases, the compaction hangs before a tmp sstable is even
created.
> I've wiped out every hints sstable I have and restarted several times. The issue always
comes back rather quickly and predictably after wiping the sstables. Compaction always seems
to succeed if the hints CFs are rather small.
> Hints are enabled, and my hint window is the default of 1hr. I do have some copies of
HintsColumnFamily sstables that do replicate this issue. However, the hints may contain confidential
data. If they'd be helpful in troubleshooting this issue, let me know and I can see about
sending them directly.
> This ring was upgraded from 1.0.7. I didn't keep any hints from the upgrade.
> Here is the output I see from compactionstats where a compaction has hung. The 'bytes
compacted' column never changes.
> {code}
> pending tasks: 1
>           compaction type        keyspace   column family bytes compacted     bytes total
 progress
>                Compaction          systemHintsColumnFamily          268082       464784758
    0.06%
> {code}
> The hung thread stack is as follows: (full jstack attached, as well)
> {code}
> "CompactionExecutor:37" daemon prio=10 tid=0x00000000063df800 nid=0x49d9 waiting on condition
[0x00007eb8c6ffa000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x000000050f2e0e58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>         at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>         at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.computeNext(ParallelCompactionIterable.java:329)
>         at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.computeNext(ParallelCompactionIterable.java:281)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
>         at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:147)
>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:126)
>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:100)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
>         at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext(ParallelCompactionIterable.java:101)
>         at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext(ParallelCompactionIterable.java:88)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
>         at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
>         at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:141)
>         at org.apache.cassandra.db.compaction.CompactionManager$7.call(CompactionManager.java:395)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message