Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 388489523 for ; Mon, 29 Oct 2012 00:03:13 +0000 (UTC) Received: (qmail 1150 invoked by uid 500); 29 Oct 2012 00:03:13 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 1106 invoked by uid 500); 29 Oct 2012 00:03:13 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 1097 invoked by uid 99); 29 Oct 2012 00:03:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2012 00:03:12 +0000 Date: Mon, 29 Oct 2012 00:03:12 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <1223223403.37338.1351468992985.JavaMail.jiratomcat@arcas> In-Reply-To: <1134820870.20186.1345141178200.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HBASE-6597) Block Encoding Size Estimation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485756#comment-13485756 ] Ted Yu commented on HBASE-6597: ------------------------------- Update on my recent fidings. I came up with patch for 0.94 branch. Most data block encoding related tests pass. TestHFileBlockCompatibility poses a little challenge. There is no embedded checksum feature in 0.89-fb branch. So this test is unique to 0.94 / trunk. In the test, there is a copy of Writer class which I assume shouldn't be modified, at least not for a point release. The test reuses some code from TestHFileBlock.java where there is some change related to usage of Writer: {code} - static int writeTestKeyValues(OutputStream dos, int seed, boolean includesMemstoreTS) + static void writeTestKeyValues(OutputStream dos, Writer hbw, int seed, boolean includesMemstoreTS) {code} This is the test failure I am observing now: {code} testDataBlockEncoding[0](org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility) Time elapsed: 0.129 sec <<< FAILURE! org.junit.ComparisonFailure: Content mismath for compression NONE, encoding PREFIX, pread false, commonPrefix 2, expected length 1859, actual length 1859 expected:<\x00\x00\x0[B\xB8]*\x0A\x00\x00\x0A\x0...> but was:<\x00\x00\x0[0\x00]*\x0A\x00\x00\x0A\x0...> at org.junit.Assert.assertEquals(Assert.java:125) at org.apache.hadoop.hbase.io.hfile.TestHFileBlock.assertBuffersEqual(TestHFileBlock.java:463) at org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility.testDataBlockEncoding(TestHFileBlockCompatibility.java:337) {code} > Block Encoding Size Estimation > ------------------------------ > > Key: HBASE-6597 > URL: https://issues.apache.org/jira/browse/HBASE-6597 > Project: HBase > Issue Type: Improvement > Components: io > Affects Versions: 0.89-fb > Reporter: Brian Nixon > Assignee: Mikhail Bautin > Priority: Minor > Attachments: 6597-trunk.txt, D5895.1.patch, D5895.2.patch, D5895.3.patch, D5895.4.patch, D5895.5.patch > > > Blocks boundaries as created by current writers are determined by the size of the unencoded data. However, blocks in memory are kept encoded. By using an estimate for the encoded size of the block, we can get greater consistency in size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira