Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 73138106A0 for ; Wed, 5 Jun 2013 05:44:25 +0000 (UTC) Received: (qmail 79849 invoked by uid 500); 5 Jun 2013 05:44:22 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 79501 invoked by uid 500); 5 Jun 2013 05:44:22 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 79488 invoked by uid 99); 5 Jun 2013 05:44:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jun 2013 05:44:21 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ramkrishna.s.vasudevan@gmail.com designates 209.85.128.49 as permitted sender) Received: from [209.85.128.49] (HELO mail-qe0-f49.google.com) (209.85.128.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jun 2013 05:44:14 +0000 Received: by mail-qe0-f49.google.com with SMTP id cz11so789858qeb.36 for ; Tue, 04 Jun 2013 22:43:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=VZ6xBF0OS+WIs6yC0eKesROLvfRGct3OabmxskJqcvw=; b=qR9DjfsjBNWwXCzV98fSlaQ2QOch4Vp6nuE/iYYwBzms7lXopsd1TiN4dOZaEITtl8 OtqfuDGYpcsVK1mxEG72vKIjHw0yBmF7Z81h/9hVDHmN9ZCj4goOBHGFeuOUasAwVdJr kNJT2bSYDMeKF3hpsENbcyGfSYD6HiO8f4SdCcLmWrz6Mg28h6FpH5iTUpPiqcAGGZRL quwBLtmNygsiJNyCdLBzK3OaOFoxQEcUJCc+XrGPEqbOnGnsVelV8Qj/VpThumawBoxO IsrYd27COGoHBpFbyqZLhJkTQu9v1RtJwO+oh5LFbd+CfpVmvPDKISuMmxqXt30TddTG 875Q== MIME-Version: 1.0 X-Received: by 10.229.22.137 with SMTP id n9mr6345237qcb.0.1370411033712; Tue, 04 Jun 2013 22:43:53 -0700 (PDT) Received: by 10.49.82.133 with HTTP; Tue, 4 Jun 2013 22:43:53 -0700 (PDT) In-Reply-To: References: Date: Wed, 5 Jun 2013 11:13:53 +0530 Message-ID: Subject: Re: Questions about HBase From: ramkrishna vasudevan To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=14dae9d70d2adc800e04de61aeaa X-Virus-Checked: Checked by ClamAV on apache.org --14dae9d70d2adc800e04de61aeaa Content-Type: text/plain; charset=ISO-8859-1 for the question whether you will be able to do a warm up for the bloom and block cache i don't think it is possible now. Regards Ram On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika wrote: > If you will read HFile v2 document on HBase site you will understand > completely how the search for a record works and why there is linear search > in the block but binary search to get to the right block. > Also bear in mind the amount of keys in a blocks is not big since a block > in HFile by default is 65k, thus from a 10GB HFile you are only fully > scanning 65k out of it. > > On Wednesday, June 5, 2013, Pankaj Gupta wrote: > > > Thanks for the replies. I'll take a look at src/main/java/org/apache/ > > hadoop/hbase/coprocessor/BaseRegionObserver.java. > > > > @ramkrishna: I do want to have bloom filter and block index all the time. > > For good read performance they're critical in my workflow. The worry is > > that when HBase is restarted it will take a long time for them to get > > populated again and performance will suffer. If there was a way of > loading > > them quickly and warm up the table then we'll be able to restart HBase > > without causing slow down in processing. > > > > > > On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu wrote: > > > > > bq. But i am not very sure if we can control the files getting selected > > for > > > compaction in the older verisons. > > > > > > Same mechanism is available in 0.94 > > > > > > Take a look > > > at > > > > src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java > > > where you would find the following methods (and more): > > > > > > public void preCompactSelection(final > > > ObserverContext c, > > > final Store store, final List candidates, final > > > CompactionRequest request) > > > public InternalScanner > > > preCompact(ObserverContext e, > > > final Store store, final InternalScanner scanner) throws > > IOException > > > { > > > > > > Cheers > > > > > > On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan < > > > ramkrishna.s.vasudevan@gmail.com> wrote: > > > > > > > >>Does Minor compaction remove HFiles in which all entries are out of > > > > TTL or does only Major compaction do that > > > > Yes it applies for Minor compactions. > > > > >>Is there a way of configuring major compaction to compact only > files > > > > older than a certain time or to compress all the files except the > > > latest > > > > few? > > > > In the latest trunk version the compaction algo itself can be > plugged. > > > > There are some coprocessor hooks that gives control on the scanner > > that > > > > gets created for compaction with which we can control the KVs being > > > > selected. But i am not very sure if we can control the files getting > > > > selected for compaction in the older verisons. > > > > >> The above excerpt seems to imply to me that the search for key > > inside > > > a > > > > block > > > > is linear and I feel I must be reading it wrong. I would expect the > > scan > > > to > > > > be a binary search. > > > > Once the data block is identified for a key, we seek to the beginning > > of > > > > the block and then do a linear search until we reach the exact key > that > > > we > > > > are looking out for. Because internally the data (KVs) are stored as > > > byte > > > > buffers per block and it follows this pattern > > > > > > > > >>Is there a way to warm up the bloom filter and block index cache > for > > > > a table? > > > > You always want the bloom and block index to be in cache? > > > > > > > > > > > > On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I have a few small questions regarding HBase. I've searched the > forum > > > but > > > > > couldn't find clear answers hence asking them here: > > > > > > > > > > > > > > > 1. Does Minor compaction remove HFiles in which all entries are > > out > > > of > > > > > TTL or does only Major compaction do that? I found this jira: > > > > > https://issues.apache.org/jira/browse/HBASE-5199 but I dont' > know > > > if > > > > > the > > > > > compaction being talked about there is minor or major. > > > > > 2. Is there a way of configuring major compaction to compact > only > > > > files > > > > > older than a certain time or to compress all the files except > the > > > > latest > > > > > few? We basically want to use the time based filtering > > optimization > > > in > > > > > HBase to get the latest additions to the table and since major > > > > > compaction > > > > > bunches everything into one file, it would defeat the > > optimization. > > > > > 3. Is there a way to warm up the bloom filter and block index > > cache > > > > for > > > > > a table? This is for a case where I always want the bloom > filters > > > and > > > > > index > > > > > to be all in memory, but not the > --14dae9d70d2adc800e04de61aeaa--