Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9251B1091E for ; Wed, 31 Jul 2013 18:05:51 +0000 (UTC) Received: (qmail 46932 invoked by uid 500); 31 Jul 2013 18:05:51 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 46785 invoked by uid 500); 31 Jul 2013 18:05:51 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 46240 invoked by uid 99); 31 Jul 2013 18:05:50 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 18:05:50 +0000 Date: Wed, 31 Jul 2013 18:05:50 +0000 (UTC) From: "Liyin Tang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725515#comment-13725515 ] Liyin Tang commented on HBASE-9102: ----------------------------------- It is true that OS cached the compressed/encoded blocks and the DFSClient non-pread operation is also able to pre-load all the bytes up to that DFS block. And this feature is to pre-load (decompress/decoded) these data blocks in additional to the OS cache/disk read-ahead. Also the scan prefetch is currently implemented in the RegionScanner level. I think it is a good idea to implement some prefetch logic in the HBase client as well. > HFile block pre-loading for large sequential scan > ------------------------------------------------- > > Key: HBASE-9102 > URL: https://issues.apache.org/jira/browse/HBASE-9102 > Project: HBase > Issue Type: Improvement > Affects Versions: 0.89-fb > Reporter: Liyin Tang > Assignee: Liyin Tang > > The current HBase scan model cannot take full advantage of the aggrediate disk throughput, especially for the large sequential scan cases. And for the large sequential scan, it is easy to predict what the next block to read in advance so that it can pre-load and decompress/decoded these data blocks from HDFS into block cache right before the current read point. > Therefore, this jira is to optimized the large sequential scan performance by pre-loading the HFile blocks into the block cache in a stream fashion so that the scan query can read from the cache directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira