cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Erokhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13545) Exception in CompactionExecutor leading to tmplink files not being removed
Date Mon, 05 Jun 2017 17:18:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037233#comment-16037233
] 

Dmitry Erokhin commented on CASSANDRA-13545:
--------------------------------------------

One of our engineers has been able to find at least one issue which leads to this condition.
His findings are below.
---

With a consistent reproduction outside of the production cluster, I downloaded the cassandra
source code, setup a remote debugger (eclipse) and connected it to the cassandra process running
on my node.
 
At this point I was able to setup breakpoints and examine a live system, starting at the last
frame in the traceback (org.apache.cassandra.io.sstable.IndexSummary.<init>(IndexSummary.java:86)).
Stepping through the code duing a live compaction, I was able to determine that the issue
is indeed a bug in Cassandra that occurs when it is trying to run a compaction job with a
very large number of partitions.
 
The SafeMemoryWriter class is used to build the index summary for the new sstable.
{code:java}
public class SafeMemoryWriter extends DataOutputBuffer
{
    private SafeMemory memory;
 
    @SuppressWarnings("resource")
    public SafeMemoryWriter(long initialCapacity)
    {
        this(new SafeMemory(initialCapacity));
    }
 
    private SafeMemoryWriter(SafeMemory memory)
    {
        super(tailBuffer(memory).order(ByteOrder.BIG_ENDIAN));
        this.memory = memory;
    }
 
    public SafeMemory currentBuffer()
    {
        return memory;
    }
 
    @Override
    protected void reallocate(long count)
    {
        long newCapacity = calculateNewSize(count);
        if (newCapacity != capacity())
        {
            long position = length();
            ByteOrder order = buffer.order();
 
            SafeMemory oldBuffer = memory;
            memory = this.memory.copy(newCapacity);
            buffer = tailBuffer(memory);
 
            int newPosition = (int) (position - tailOffset(memory));
            buffer.position(newPosition);
            buffer.order(order);
 
            oldBuffer.free();
        }
    }
 
    public void setCapacity(long newCapacity)
    {
        reallocate(newCapacity);
    }
 
    public void close()
    {
        memory.close();
    }
 
    public Throwable close(Throwable accumulate)
    {
        return memory.close(accumulate);
    }
 
    public long length()
    {
        return tailOffset(memory) +  buffer.position();
    }
 
    public long capacity()
    {
        return memory.size();
    }
 
    @Override
    public SafeMemoryWriter order(ByteOrder order)
    {
        super.order(order);
        return this;
    }
 
    @Override
    public long validateReallocation(long newSize)
    {
        return newSize;
    }
 
    private static long tailOffset(Memory memory)
    {
        return Math.max(0, memory.size - Integer.MAX_VALUE);
    }
 
    private static ByteBuffer tailBuffer(Memory memory)
    {
        return memory.asByteBuffer(tailOffset(memory), (int) Math.min(memory.size, Integer.MAX_VALUE));
    }
}
{code}
The appears like it is intended to work with buffers larger than Integer.MAX_VALUE, however
if the initial size of the buffer is larger than that the initial value of length() will be
incorrect (it won’t be zero) and writing via the DataOutputBuffer will write in the wrong
location (it won’t start at offset 0).
 
 
{code:java}
    public IndexSummaryBuilder(long expectedKeys, int minIndexInterval, int samplingLevel)
    {
        this.samplingLevel = samplingLevel;
        this.startPoints = Downsampling.getStartPoints(BASE_SAMPLING_LEVEL, samplingLevel);
 
        long maxExpectedEntries = expectedKeys / minIndexInterval;
        if (maxExpectedEntries > Integer.MAX_VALUE)
        {
            // that's a _lot_ of keys, and a very low min index interval
            int effectiveMinInterval = (int) Math.ceil((double) Integer.MAX_VALUE / expectedKeys);
            maxExpectedEntries = expectedKeys / effectiveMinInterval;
            assert maxExpectedEntries <= Integer.MAX_VALUE : maxExpectedEntries;
            logger.warn("min_index_interval of {} is too low for {} expected keys; using interval
of {} instead",
                        minIndexInterval, expectedKeys, effectiveMinInterval);
            this.minIndexInterval = effectiveMinInterval;
        }
        else
        {
            this.minIndexInterval = minIndexInterval;
        }
 
        // for initializing data structures, adjust our estimates based on the sampling level
        maxExpectedEntries = Math.max(1, (maxExpectedEntries * samplingLevel) / BASE_SAMPLING_LEVEL);
        offsets = new SafeMemoryWriter(4 * maxExpectedEntries).order(ByteOrder.nativeOrder());
        entries = new SafeMemoryWriter(40 * maxExpectedEntries).order(ByteOrder.nativeOrder());
 
        // the summary will always contain the first index entry (downsampling will never
remove it)
        nextSamplePosition = 0;
        indexIntervalMatches++;
    }
{code}
The bug occurs when the entries table in the index summary for the new sstable is larger than
Integer.MAX_VALUE bytes (2 GiB). This happens when expectedKeys > Integer.MAX_VALUE / 40
* minIndexInterval . Our partitions for the blocks table have a mean size of 179 bytes, so
we would expect to see issues on this table for compactions over about 1.12 TiB.
 
The default value of minIndexInterval is 128, however it is adjustable per table and can be
used to avoid this condition. It should be set to a power of 2. I ran this cql on my test
node:
{code:sql}
ALTER TABLE tablename.blocks WITH min_index_interval = 512 ;
{code}
Since this change, I haven’t seen the assertion. The compaction has proceeded much farther
than before, but it has not completed yet since it is so large.
{noformat}
$ nodetool compactionstats -H
pending tasks: 1
                                     id   compaction type       keyspace    table   completed
    total    unit   progress
   9965f4b0-4749-11e7-b21c-91cb0a91f895        Compaction   tablename   blocks   629.51 GB
  1.34 TB   bytes     45.71%
Active compaction remaining time :        n/a
{noformat}
I would expect that making this change would fix the issue for all future compactions on all
nodes.
 
The index summary is used to reduce disk io to the sstable index. A larger index interval
would result in a less efficient index summary and more io to the sstable index. However the
min is just the minimum value, the actual value is controlled automatically by Cassandra.
On p10, it is 2048 for the larger blocks sstables, so I would not expect a performance impact.

Compaction failed with new error
{code}
ERROR [CompactionExecutor:6] 2017-06-04 10:15:26,115 CassandraDaemon.java:185 - Exception
in thread Thread[CompactionExecutor:6,1,RMI Runtime]
java.lang.AssertionError: Illegal bounds [-2147483648..-2147483640); size: 3355443200
        at org.apache.cassandra.io.util.Memory.checkBounds(Memory.java:339) ~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:104) ~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.util.Memory.getLong(Memory.java:260) ~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:224)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.util.CompressedSegmentedFile.createMappedSegments(CompressedSegmentedFile.java:80)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.<init>(CompressedPoolingSegmentedFile.java:38)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:101)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:188)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:179)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.sstable.format.big.BigTableWriter.openFinal(BigTableWriter.java:345)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.sstable.format.big.BigTableWriter.openFinalEarly(BigTableWriter.java:333)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.sstable.SSTableRewriter.switchWriter(SSTableRewriter.java:297)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.io.sstable.SSTableRewriter.doPrepare(SSTableRewriter.java:345)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:169)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.doPrepare(CompactionAwareWriter.java:79)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:169)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:179)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.finish(CompactionAwareWriter.java:89)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:196)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:256)
~[apache-cassandra-2.2.5.jar:2.2.5]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_131]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
{code}


> Exception in CompactionExecutor leading to tmplink files not being removed
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13545
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13545
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Dmitry Erokhin
>
> We are facing an issue where compactions fail on a few nodes with the following message
> {code}
> ERROR [CompactionExecutor:1248] 2017-05-22 15:32:55,390 CassandraDaemon.java:185 - Exception
in thread Thread[CompactionExecutor:1248,1,main]
> java.lang.AssertionError: null
> 	at org.apache.cassandra.io.sstable.IndexSummary.<init>(IndexSummary.java:86) ~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at org.apache.cassandra.io.sstable.IndexSummaryBuilder.build(IndexSummaryBuilder.java:235)
~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at org.apache.cassandra.io.sstable.format.big.BigTableWriter.openEarly(BigTableWriter.java:316)
~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(SSTableRewriter.java:170)
~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:115)
~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.append(DefaultCompactionWriter.java:64)
~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:184)
~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74)
~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:256)
~[apache-cassandra-2.2.5.jar:2.2.5]
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_121]
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_121]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_121]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121]
> 	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}
> Also, the number of tmplink files in /var/lib/cassandra/data/<keyspace name>/blocks/tmplink*
is growing constantly until node runs out of space. Restarting cassandra removes all tmplink
files, but the issue still continues.
> We are using Cassandra 2.2.5 on Debian 8 with Oracle Java 8
> {code}
> root@cassandra-p10:/var/lib/cassandra/data/mugenstorage/blocks-33167ef0447a11e68f3e5b42fc45b62f#
dpkg -l | grep -E "java|cassandra"
> ii  cassandra                      2.2.5                        all          distributed
storage system for structured data
> ii  cassandra-tools                2.2.5                        all          distributed
storage system for structured data
> ii  java-common                    0.52                         all          Base of
all Java packages
> ii  javascript-common              11                           all          Base support
for JavaScript library packages
> ii  oracle-java8-installer         8u121-1~webupd8~0            all          Oracle Java(TM)
Development Kit (JDK) 8
> ii  oracle-java8-set-default       8u121-1~webupd8~0            all          Set Oracle
JDK 8 as default Java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message