accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <>
Subject [jira] [Comment Edited] (ACCUMULO-4669) RFile can create very large blocks when key statistics are not uniform
Date Fri, 30 Jun 2017 17:22:00 GMT


Keith Turner edited comment on ACCUMULO-4669 at 6/30/17 5:21 PM:

I ran into problems when running [webindex|].  This application
would insert data like the following.

row = <hash of url 1><url 1>   col = <Col1>
row = <hash of url 1><url 1>   col = <Col2>
row = <hash of url 1><url 1>   col = <Col3>
row = <hash of url 2><url 2>   col = <Col1>
row = <hash of url 2><url 2>   col = <Col2>
row = <hash of url 2><url 2>   col = <Col3>
row = <hash of url 3><url 3>   col = <Col1>
row = <hash of url 3><url 3>   col = <Col2>
row = <hash of url 3><url 3>   col = <Col3>

The application uses [Common Crawl data|] so the URLs were derived
from real web pages.  I suspect they followed a zipfian distribution.   There would be a very
few really large urls.   Since the same URL was used for multiple keys (because its in the
row and there are multiple columns) those large keys had an extremely high chance of ending
up in the index.    When the first large key is added to a data block, it makes the block
large.  The second large key causes the block to close and the large key to end up in the
index.  These few large keys can quadruple the index size (which is bad for caching the index).

If a megabyte row has many columns then it may end up in the index multiple times. The exact
same row in the index multiple times prevents the shortening added in ACCUMULO-1124 from doing
anything with the row.  It may be able to shorten the column, but if the column is a few bytes
and the row is a megabyte then shortening the column does not help.

I thought about adding relative compression to the index, but the index needs to be decompressed
in memory for binary search.   Could create a binary search that operates on a relative compressed
index I suppose.  But would still need to keep the first instance of the megabyte row if using
relative compression.

was (Author: kturner):
I ran into problems when running [webindex|].  This application
would insert data like the following.

row = <hash of url 1><url 1>   col = <Col1>
row = <hash of url 1><url 1>   col = <Col2>

> RFile can create very large blocks when key statistics are not uniform
> ----------------------------------------------------------------------
>                 Key: ACCUMULO-4669
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.7.2, 1.7.3, 1.8.0, 1.8.1
>            Reporter: Adam Fuchs
>            Assignee: Keith Turner
>            Priority: Blocker
>             Fix For: 1.7.4, 1.8.2, 2.0.0
> RFile.Writer.append checks for giant keys and avoid writing them as index blocks. This
check is flawed and can result in multi-GB blocks. In our case, a 20GB compressed RFile had
one block with over 2GB raw size. This happened because the key size statistics changed after
some point in the file. The code in question follows:
> {code}
>     private boolean isGiantKey(Key k) {
>       // consider a key thats more than 3 standard deviations from previously seen key
sizes as giant
>       return k.getSize() > keyLenStats.getMean() + keyLenStats.getStandardDeviation()
* 3;
>     }
> ...
>       if (blockWriter == null) {
>         blockWriter = fileWriter.prepareDataBlock();
>       } else if (blockWriter.getRawSize() > blockSize) {
>         ...
>         if ((prevKey.getSize() <= avergageKeySize || blockWriter.getRawSize() >
maxBlockSize) && !isGiantKey(prevKey)) {
>           closeBlock(prevKey, false);
> ...
> {code}
> Before closing a block that has grown beyond the target block size we check to see that
the key is below average in size or that the block is 1.1 times the target block size (maxBlockSize),
and we check that the key isn't a "giant" key, or more than 3 standard deviations from the
mean of keys seen so far.
> Our RFiles often have one row of data with different column families representing various
forward and inverted indexes. This is a table design similar to the WikiSearch example. The
first column family in this case had very uniform, relatively small key sizes. This first
column family comprised gigabytes of data, split up into roughly 100KB blocks. When we switched
to the next column family the keys grew in size, but were still under about 100 bytes. The
statistics of the first column family had firmly established a smaller mean and tiny standard
deviation (approximately 0), and it took over 2GB of larger keys to bring the standard deviation
up enough so that keys were no longer considered "giant" and the block could be closed.
> Now that we're aware, we see large blocks (more than 10x the target block size) in almost
every RFile we write. This only became a glaring problem when we got OOM exceptions trying
to decompress the block, but it also shows up in a number of subtle performance problems,
like high variance in latencies for looking up particular keys.
> The fix for this should produce bounded RFile block sizes, limited to the greater of
2x the maximum key/value size in the block and some configurable threshold, such as 1.1 times
the compressed block size. We need a firm cap to be able to reason about memory usage in various
> The following code produces arbitrarily large RFile blocks:
> {code}
>   FileSKVWriter writer = RFileOperations.getInstance().openWriter(filename, fs, conf,
>   writer.startDefaultLocalityGroup();
>   SummaryStatistics keyLenStats = new SummaryStatistics();
>   Random r = new Random();
>   byte [] buffer = new byte[minRowSize]; 
>   for(int i = 0; i < 100000; i++) {
>     byte [] valBytes = new byte[valLength];
>     r.nextBytes(valBytes);
>     r.nextBytes(buffer);
>     ByteBuffer.wrap(buffer).putInt(i);
>     Key k = new Key(buffer, 0, buffer.length, emptyBytes, 0, 0, emptyBytes, 0, 0, emptyBytes,
0, 0, 0);
>     Value v = new Value(valBytes);
>     writer.append(k, v);
>     keyLenStats.addValue(k.getSize());
>     int newBufferSize = Math.max(buffer.length, (int) Math.ceil(keyLenStats.getMean()
+ keyLenStats.getStandardDeviation() * 4 + 0.0001));
>     buffer = new byte[newBufferSize];
>     if(keyLenStats.getSum() > targetSize)
>       break;
>   }
>       writer.close();
> {code}
> One telltale symptom of this bug is an OutOfMemoryException thrown from a readahead thread
with message "Requested array size exceeds VM limit". This will only happen if the block cache
size is big enough to hold the expected raw block size, 2GB in our case. This message is rare,
and really only happens when allocating an array of size Integer.MAX_VALUE or Integer.MAX_VALUE-1
on the hotspot JVM. Integer.MAX_VALUE happens in this case due to some strange handling of
raw block sizes in the BCFile code. Most OutOfMemoryExceptions have different messages.

This message was sent by Atlassian JIRA

View raw message