hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Gupta <pan...@brightroll.com>
Subject Re: Questions about HBase
Date Wed, 05 Jun 2013 06:06:24 GMT
>From what I read about HFileV2 and looking at the performance in my cluster
it seems that bloom filter and index blocks are loaded on demand as blocks
are accessed. Isn't that the case? I see that bloom filters are being
loaded all the time when I run scans and not just once.


On Tue, Jun 4, 2013 at 10:52 PM, Asaf Mesika <asaf.mesika@gmail.com> wrote:

> When you do the first read of this region, wouldn't this load all bloom
> filters?
>
>
>
> On Wed, Jun 5, 2013 at 8:43 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > for the question whether you will be able to do a warm up for the bloom
> and
> > block cache i don't think it is possible now.
> >
> > Regards
> > Ram
> >
> >
> > On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika <asaf.mesika@gmail.com>
> > wrote:
> >
> > > If you will read HFile v2 document on HBase site you will understand
> > > completely how the search for a record works and why there is linear
> > search
> > > in the block but binary search to get to the right block.
> > > Also bear in mind the amount of keys in a blocks is not big since a
> block
> > > in HFile by default is 65k, thus from a 10GB HFile you are only fully
> > > scanning 65k out of it.
> > >
> > > On Wednesday, June 5, 2013, Pankaj Gupta wrote:
> > >
> > > > Thanks for the replies. I'll take a look at src/main/java/org/apache/
> > > > hadoop/hbase/coprocessor/BaseRegionObserver.java.
> > > >
> > > > @ramkrishna: I do want to have bloom filter and block index all the
> > time.
> > > > For good read performance they're critical in my workflow. The worry
> is
> > > > that when HBase is restarted it will take a long time for them to get
> > > > populated again and performance will suffer. If there was a way of
> > > loading
> > > > them quickly and warm up the table then we'll be able to restart
> HBase
> > > > without causing slow down in processing.
> > > >
> > > >
> > > > On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > > > bq. But i am not very sure if we can control the files getting
> > selected
> > > > for
> > > > > compaction in the older verisons.
> > > > >
> > > > > Same mechanism is available in 0.94
> > > > >
> > > > > Take a look
> > > > > at
> > > > >
> > >
> src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
> > > > > where you would find the following methods (and more):
> > > > >
> > > > >   public void preCompactSelection(final
> > > > > ObserverContext<RegionCoprocessorEnvironment> c,
> > > > >       final Store store, final List<StoreFile> candidates,
final
> > > > > CompactionRequest request)
> > > > >   public InternalScanner
> > > > > preCompact(ObserverContext<RegionCoprocessorEnvironment> e,
> > > > >       final Store store, final InternalScanner scanner) throws
> > > > IOException
> > > > > {
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan <
> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > >
> > > > > > >>Does Minor compaction remove HFiles in which all entries
are
> out
> > of
> > > > > >    TTL or does only Major compaction do that
> > > > > > Yes it applies for Minor compactions.
> > > > > > >>Is there a way of configuring major compaction to compact
only
> > > files
> > > > > >    older than a certain time or to compress all the files except
> > the
> > > > > latest
> > > > > >    few?
> > > > > > In the latest trunk version the compaction algo itself can be
> > > plugged.
> > > > > >  There are some coprocessor hooks that gives control on the
> scanner
> > > > that
> > > > > > gets created for compaction with which we can control the KVs
> being
> > > > > > selected. But i am not very sure if we can control the files
> > getting
> > > > > > selected for compaction in the older verisons.
> > > > > > >> The above excerpt seems to imply to me that the search
for key
> > > > inside
> > > > > a
> > > > > > block
> > > > > > is linear and I feel I must be reading it wrong. I would expect
> the
> > > > scan
> > > > > to
> > > > > > be a binary search.
> > > > > > Once the data block is identified for a key, we seek to the
> > beginning
> > > > of
> > > > > > the block and then do a linear search until we reach the exact
> key
> > > that
> > > > > we
> > > > > > are looking out for.  Because internally the data (KVs) are
> stored
> > as
> > > > > byte
> > > > > > buffers per block and it follows this pattern
> > > > > > <keylength><valuelength><keybytearray><valuebytearray>
> > > > > > >>Is there a way to warm up the bloom filter and block
index
> cache
> > > for
> > > > > >    a table?
> > > > > > You always want the bloom and block index to be in cache?
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta <
> > pankaj@brightroll.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I have a few small questions regarding HBase. I've searched
the
> > > forum
> > > > > but
> > > > > > > couldn't find clear answers hence asking them here:
> > > > > > >
> > > > > > >
> > > > > > >    1. Does Minor compaction remove HFiles in which all
entries
> > are
> > > > out
> > > > > of
> > > > > > >    TTL or does only Major compaction do that? I found this
> jira:
> > > > > > >    https://issues.apache.org/jira/browse/HBASE-5199 but
I
> dont'
> > > know
> > > > > if
> > > > > > > the
> > > > > > >    compaction being talked about there is minor or major.
> > > > > > >    2. Is there a way of configuring major compaction to
compact
> > > only
> > > > > > files
> > > > > > >    older than a certain time or to compress all the files
> except
> > > the
> > > > > > latest
> > > > > > >    few? We basically want to use the time based filtering
> > > > optimization
> > > > > in
> > > > > > >    HBase to get the latest additions to the table and since
> major
> > > > > > > compaction
> > > > > > >    bunches everything into one file, it would defeat the
> > > > optimization.
> > > > > > >    3. Is there a way to warm up the bloom filter and block
> index
> > > > cache
> > > > > > for
> > > > > > >    a table? This is for a case where I always want the
bloom
> > > filters
> > > > > and
> > > > > > > index
> > > > > > >    to be all in memory, but not the
> > >
> >
>



-- 


*P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | pankaj@brightroll.com

Pankaj Gupta | Software Engineer

*BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com


United States | Canada | United Kingdom | Germany


We're hiring<http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7>
!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message