Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of
 ramkrishna.s.vasudevan@gmail.com designates 209.85.128.49 as permitted
 sender)
MIME-Version: 1.0
In-Reply-To: 
 <CA+r7YvnFeNtu9bbjd8BRfyXTo1-KuOuOQQb3FR5NMq=qikLLAQ@mail.gmail.com>
References: 
 <CANnTnFeguBU4-DLMuPA+bNYJNTbmFj9nmiL2E=7_cQNYYxxErQ@mail.gmail.com>
	<CAAT7Mkr8iPvDAgP3LUhPkqHqHDwuH3H0ivgOb4ot8BpWqB=zPw@mail.gmail.com>
	<CALte62wGvsn0wP9Z+NHhGTCeufa3RBoVfiyD6o_T60VzuC+2fQ@mail.gmail.com>
	<CANnTnFf196tPE22c2uMPYi+mYt0RePfOf72vEZajLWp76beBDw@mail.gmail.com>
	<CA+r7YvnFeNtu9bbjd8BRfyXTo1-KuOuOQQb3FR5NMq=qikLLAQ@mail.gmail.com>
Date: Wed, 5 Jun 2013 11:13:53 +0530
Message-ID: 
 <CAAT7MkqWdCvgU9uGj16KmSLWTn5-sfWOhU2f9PdSJ1U7gA-Bew@mail.gmail.com>
Subject: Re: Questions about HBase
From: ramkrishna vasudevan <ramkrishna.s.vasudevan@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=14dae9d70d2adc800e04de61aeaa

--14dae9d70d2adc800e04de61aeaa
Content-Type: text/plain; charset=ISO-8859-1

for the question whether you will be able to do a warm up for the bloom and
block cache i don't think it is possible now.

Regards
Ram


On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika <asaf.mesika@gmail.com> wrote:

> If you will read HFile v2 document on HBase site you will understand
> completely how the search for a record works and why there is linear search
> in the block but binary search to get to the right block.
> Also bear in mind the amount of keys in a blocks is not big since a block
> in HFile by default is 65k, thus from a 10GB HFile you are only fully
> scanning 65k out of it.
>
> On Wednesday, June 5, 2013, Pankaj Gupta wrote:
>
> > Thanks for the replies. I'll take a look at src/main/java/org/apache/
> > hadoop/hbase/coprocessor/BaseRegionObserver.java.
> >
> > @ramkrishna: I do want to have bloom filter and block index all the time.
> > For good read performance they're critical in my workflow. The worry is
> > that when HBase is restarted it will take a long time for them to get
> > populated again and performance will suffer. If there was a way of
> loading
> > them quickly and warm up the table then we'll be able to restart HBase
> > without causing slow down in processing.
> >
> >
> > On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > bq. But i am not very sure if we can control the files getting selected
> > for
> > > compaction in the older verisons.
> > >
> > > Same mechanism is available in 0.94
> > >
> > > Take a look
> > > at
> > >
> src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
> > > where you would find the following methods (and more):
> > >
> > >   public void preCompactSelection(final
> > > ObserverContext<RegionCoprocessorEnvironment> c,
> > >       final Store store, final List<StoreFile> candidates, final
> > > CompactionRequest request)
> > >   public InternalScanner
> > > preCompact(ObserverContext<RegionCoprocessorEnvironment> e,
> > >       final Store store, final InternalScanner scanner) throws
> > IOException
> > > {
> > >
> > > Cheers
> > >
> > > On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > >>Does Minor compaction remove HFiles in which all entries are out of
> > > >    TTL or does only Major compaction do that
> > > > Yes it applies for Minor compactions.
> > > > >>Is there a way of configuring major compaction to compact only
> files
> > > >    older than a certain time or to compress all the files except the
> > > latest
> > > >    few?
> > > > In the latest trunk version the compaction algo itself can be
> plugged.
> > > >  There are some coprocessor hooks that gives control on the scanner
> > that
> > > > gets created for compaction with which we can control the KVs being
> > > > selected. But i am not very sure if we can control the files getting
> > > > selected for compaction in the older verisons.
> > > > >> The above excerpt seems to imply to me that the search for key
> > inside
> > > a
> > > > block
> > > > is linear and I feel I must be reading it wrong. I would expect the
> > scan
> > > to
> > > > be a binary search.
> > > > Once the data block is identified for a key, we seek to the beginning
> > of
> > > > the block and then do a linear search until we reach the exact key
> that
> > > we
> > > > are looking out for.  Because internally the data (KVs) are stored as
> > > byte
> > > > buffers per block and it follows this pattern
> > > > <keylength><valuelength><keybytearray><valuebytearray>
> > > > >>Is there a way to warm up the bloom filter and block index cache
> for
> > > >    a table?
> > > > You always want the bloom and block index to be in cache?
> > > >
> > > >
> > > > On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta <pankaj@brightroll.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a few small questions regarding HBase. I've searched the
> forum
> > > but
> > > > > couldn't find clear answers hence asking them here:
> > > > >
> > > > >
> > > > >    1. Does Minor compaction remove HFiles in which all entries are
> > out
> > > of
> > > > >    TTL or does only Major compaction do that? I found this jira:
> > > > >    https://issues.apache.org/jira/browse/HBASE-5199 but I dont'
> know
> > > if
> > > > > the
> > > > >    compaction being talked about there is minor or major.
> > > > >    2. Is there a way of configuring major compaction to compact
> only
> > > > files
> > > > >    older than a certain time or to compress all the files except
> the
> > > > latest
> > > > >    few? We basically want to use the time based filtering
> > optimization
> > > in
> > > > >    HBase to get the latest additions to the table and since major
> > > > > compaction
> > > > >    bunches everything into one file, it would defeat the
> > optimization.
> > > > >    3. Is there a way to warm up the bloom filter and block index
> > cache
> > > > for
> > > > >    a table? This is for a case where I always want the bloom
> filters
> > > and
> > > > > index
> > > > >    to be all in memory, but not the
>

--14dae9d70d2adc800e04de61aeaa--