Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5B740200CAD for ; Wed, 28 Jun 2017 23:17:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 59DB8160BD9; Wed, 28 Jun 2017 21:17:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6C3F6160BFA for ; Wed, 28 Jun 2017 23:17:06 +0200 (CEST) Received: (qmail 77020 invoked by uid 500); 28 Jun 2017 21:17:05 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 76990 invoked by uid 99); 28 Jun 2017 21:17:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jun 2017 21:17:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 28CF7C0E93 for ; Wed, 28 Jun 2017 21:17:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id XrJGZa0m2gEf for ; Wed, 28 Jun 2017 21:17:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 995655FC97 for ; Wed, 28 Jun 2017 21:17:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C3E55E092E for ; Wed, 28 Jun 2017 21:17:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4CE6424169 for ; Wed, 28 Jun 2017 21:17:00 +0000 (UTC) Date: Wed, 28 Jun 2017 21:17:00 +0000 (UTC) From: "Keith Turner (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ACCUMULO-4669) RFile can create very large blocks when key statistics are not uniform MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 28 Jun 2017 21:17:07 -0000 [ https://issues.apache.org/jira/browse/ACCUMULO-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Turner updated ACCUMULO-4669: ----------------------------------- Affects Version/s: 1.7.2 1.7.3 1.8.0 1.8.1 > RFile can create very large blocks when key statistics are not uniform > ---------------------------------------------------------------------- > > Key: ACCUMULO-4669 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4669 > Project: Accumulo > Issue Type: Bug > Components: core > Affects Versions: 1.7.2, 1.7.3, 1.8.0, 1.8.1 > Reporter: Adam Fuchs > Priority: Critical > Fix For: 1.7.4, 1.8.2, 2.0.0 > > > RFile.Writer.append checks for giant keys and avoid writing them as index blocks. This check is flawed and can result in multi-GB blocks. In our case, a 20GB compressed RFile had one block with over 2GB raw size. This happened because the key size statistics changed after some point in the file. The code in question follows: > {code} > private boolean isGiantKey(Key k) { > // consider a key thats more than 3 standard deviations from previously seen key sizes as giant > return k.getSize() > keyLenStats.getMean() + keyLenStats.getStandardDeviation() * 3; > } > ... > if (blockWriter == null) { > blockWriter = fileWriter.prepareDataBlock(); > } else if (blockWriter.getRawSize() > blockSize) { > ... > if ((prevKey.getSize() <= avergageKeySize || blockWriter.getRawSize() > maxBlockSize) && !isGiantKey(prevKey)) { > closeBlock(prevKey, false); > ... > {code} > Before closing a block that has grown beyond the target block size we check to see that the key is below average in size or that the block is 1.1 times the target block size (maxBlockSize), and we check that the key isn't a "giant" key, or more than 3 standard deviations from the mean of keys seen so far. > Our RFiles often have one row of data with different column families representing various forward and inverted indexes. This is a table design similar to the WikiSearch example. The first column family in this case had very uniform, relatively small key sizes. This first column family comprised gigabytes of data, split up into roughly 100KB blocks. When we switched to the next column family the keys grew in size, but were still under about 100 bytes. The statistics of the first column family had firmly established a smaller mean and tiny standard deviation (approximately 0), and it took over 2GB of larger keys to bring the standard deviation up enough so that keys were no longer considered "giant" and the block could be closed. > Now that we're aware, we see large blocks (more than 10x the target block size) in almost every RFile we write. This only became a glaring problem when we got OOM exceptions trying to decompress the block, but it also shows up in a number of subtle performance problems, like high variance in latencies for looking up particular keys. > The fix for this should produce bounded RFile block sizes, limited to the greater of 2x the maximum key/value size in the block and some configurable threshold, such as 1.1 times the compressed block size. We need a firm cap to be able to reason about memory usage in various applications. > The following code produces arbitrarily large RFile blocks: > {code} > FileSKVWriter writer = RFileOperations.getInstance().openWriter(filename, fs, conf, acuconf); > writer.startDefaultLocalityGroup(); > SummaryStatistics keyLenStats = new SummaryStatistics(); > Random r = new Random(); > byte [] buffer = new byte[minRowSize]; > for(int i = 0; i < 100000; i++) { > byte [] valBytes = new byte[valLength]; > r.nextBytes(valBytes); > r.nextBytes(buffer); > ByteBuffer.wrap(buffer).putInt(i); > Key k = new Key(buffer, 0, buffer.length, emptyBytes, 0, 0, emptyBytes, 0, 0, emptyBytes, 0, 0, 0); > Value v = new Value(valBytes); > writer.append(k, v); > keyLenStats.addValue(k.getSize()); > int newBufferSize = Math.max(buffer.length, (int) Math.ceil(keyLenStats.getMean() + keyLenStats.getStandardDeviation() * 4 + 0.0001)); > buffer = new byte[newBufferSize]; > if(keyLenStats.getSum() > targetSize) > break; > } > writer.close(); > {code} > One telltale symptom of this bug is an OutOfMemoryException thrown from a readahead thread with message "Requested array size exceeds VM limit". This will only happen if the block cache size is big enough to hold the expected raw block size, 2GB in our case. This message is rare, and really only happens when allocating an array of size Integer.MAX_VALUE or Integer.MAX_VALUE-1 on the hotspot JVM. Integer.MAX_VALUE happens in this case due to some strange handling of raw block sizes in the BCFile code. Most OutOfMemoryExceptions have different messages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)