hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chandra kant <chandralakshmikan...@gmail.com>
Subject Re: Cache invalidation in Blockcache
Date Sun, 30 Mar 2014 18:31:13 GMT
Yeah..thanks a lot..it perfectly sums it up..

--Chandra

On Sunday, 30 March 2014, lars hofhansl <larsh@apache.org> wrote:

> Fundamentally HBase performs a merge sort. There is a clearly defined sort
> order between KeyValues:
> 1. row key
> 2. column family
> 3. column qualifier
> 4. timestamp
> (there are a few rules and wrinkles about MVCC visibility of changes)
>
> row key, column family, and column qualifier are compared
> lexicographically, the the timestamps are long values and sorted in reverse
> chronological order (so the newest sort first).
>
> The memstore is a sorted data structure (a skip list set), and HFiles are
> sorted as well.
> So HBase will "simply" perform a merge sort between all sources (memstore,
> HFiles, etc), and return the KeyValues in order.
> The block cache does not enter this discussion from a correctness
> viewpoint, it is just a means to access data in HFiles more efficiently.
>
> Does this answer your question?
>
> -- Lars
>
>
>
> ________________________________
>  From: chandra kant <chandralakshmikant90@gmail.com <javascript:;>>
> To: "user@hbase.apache.org <javascript:;>" <user@hbase.apache.org<javascript:;>
> >
> Sent: Sunday, March 30, 2014 12:12 AM
> Subject: Re: Cache invalidation in Blockcache
>
>
> I am using habse 94 version . Just one clarification - if I am requesting
> just a single row which is still in memstore , then read operation will
> simply send back this result to client. This latest version of row won't be
> cached in Blockcache. Blockcaching will only happen if data is read from
> storefiles(Hfile).
> What if latest version of my row is in memstore and rest 2 versions are in
> Hfile and I want all 3 versions? In this case, whether cached block with
> that row key will be evicted from Blockcache?
>
> Thanks
> Chandra
> On Sunday, 30 March 2014, Anoop John <anoop.hbase@gmail.com <javascript:;>>
> wrote:
>
> > >Also, if row is changed by some write ,
> > then it will be  reloaded in Blockcache along with the Hfile it belongs
> to
> > ,if  Blockcache is enabled on table
> >
> > That statement is not so correct..  Because there is no row wise caching.
> > It is just block of KVs being cached.  So a write will not deal with
> block
> > cache as such.  This write will go to Memstore.  During read yes mostly
> > this version in memstore will come out (as this is most recent)  .. If
> > maxversions for that table cf is >1 and Scan is requesting more than one
> > version, mutiple versions of a cell can come out.    Which version u r
> > using?
> >
> > -Anoop-
> >
> > On Sun, Mar 30, 2014 at 12:04 PM, chandra kant <
> > chandralakshmikant90@gmail.com <javascript:;> <javascript:;>> wrote:
> >
> > > Thanks anoop..
> > > Here is my understanding..
> > > basically memstores will be scanned no matter whether requested row is
> > > already present in the Blockcache . Also, if row is changed by some
> > write ,
> > > then it will be  reloaded in Blockcache along with the Hfile it belongs
> > to
> > > ,if  Blockcache is enabled on table .
> > >
> > > Thanks..
> > > Chandra
> > >
> > > On Sunday, 30 March 2014, Anoop John <anoop.hbase@gmail.com<javascript:;>
> <javascript:;>>
> > wrote:
> > >
> > > > In block cache data is cached not as rows..  As u know when writing
> > > HFiles,
> > > > one HFile will logically split into blocks (With def size of 64K)  .
> > > During
> > > > reads data is read from files as blocks. (Even if u do a single row
> > get)
> > > > from file HBase has to to read atleast one block.   The block cache
> > > caches
> > > > these blocks.  So during read if we find the requested block being in
> > the
> > > > cache, we wont read again from HDFS.  This way the block cache helps.
> > > >
> > > > So the 1st question answer is no.
> > > >
> > > > During reading, it is not like 1st check in memstore and then in
> block
> > > > cache.  It is like a Heap of scanners on the memstore and on all
> > HFiles.
> > > > KVs comes out of this scanner as per the result of KV comparator
> > > > comparison.  Compare  row, cf, family, TS  and finally a memstoreTS
> > > (which
> > > > is like increasing on every write)  So mostly a KV from memstore will
> > > > normally comes out 1st before those from files.  But during writes
> one
> > > can
> > > > always specify TS, if some one writes explicetly with TS and 1st
> write
> > > some
> > > > future TS cell and it got flushed to a file and later write a past TS
> > kv
> > > > and it is in memstore , the above said normal case may not come
> > > > applicable.. Hope I make it clear for u..   Again when u read from
> > Files,
> > > > files are read as block by block and during that time check in Cache.
> > If
> > > > that block of this file is already read into cache, there wont be an
> > IO.
> > > >
> > > > -Anoop-
> > > > On Sun, Mar 30, 2014 at 11:44 AM, chandra kant <
> > >  > chandralakshmikant90@gmail.com <javascript:;> <javascript:;>
> <javascript:;>> wrote:
>
> > > >
> > > > > Hi,
> > > > > I have Blockcache enabled on my table. So, I read a row and it's
> > stored
> > > > in
> > > > > Blockcache . Next, I do a write on that row and I read it again .
> > > > > My question is -  does writing that row invalidates the entry of
> that
> > > row
> > > > > in Blockcache ?
> > > > > Also, while reading , does RegionScanner first check memstore for
> any
> > > > > updates regrading that row or Blockcache ?
> > > > > It's quite confusing from what  I have read..
> > > > > Thanks
> > > > > Chandra
> > > > >
> > > >
> > >
> >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message