Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AB05D200B5E for ; Wed, 10 Aug 2016 10:12:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A9990160A93; Wed, 10 Aug 2016 08:12:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F2A65160A90 for ; Wed, 10 Aug 2016 10:12:21 +0200 (CEST) Received: (qmail 97908 invoked by uid 500); 10 Aug 2016 08:12:21 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 97890 invoked by uid 99); 10 Aug 2016 08:12:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Aug 2016 08:12:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A880B2C02AD for ; Wed, 10 Aug 2016 08:12:20 +0000 (UTC) Date: Wed, 10 Aug 2016 08:12:20 +0000 (UTC) From: "ramkrishna.s.vasudevan (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 10 Aug 2016 08:12:22 -0000 [ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414885#comment-15414885 ] ramkrishna.s.vasudevan commented on HBASE-16213: ------------------------------------------------ Perf improvement is great. With smaller blocks and bigger value size impact is lesser as only very few rows are to be found so that seek is not taking time. The meta data overhead is at the max 4k more I think. HAving multiple columns for the same row also should go with the same meta data overhead only (if the total size is going to account to approx 1K). Went through the patch. Some of the tag related decode and encode can be moved to a subclass and avoid duplicate with the existing code I think. And see if the SeekState's Cell impl should be all together new in the new EncodedSeeker state implementation. May be they can be reused. I have not checked if there is something different so that it is not getting reused. I think all the existing tests for DBE would work with this because the new DBE enum will iterate through all. Do you need any specific test case for these new types? > A new HFileBlock structure for fast random get > ---------------------------------------------- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance > Reporter: binlijin > Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, hfile_block_performance.pptx, new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)