hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: keyvalue cache
Date Wed, 04 Apr 2012 21:54:39 GMT
Yes, something like this. In many use cases only the latest (last) version matters,
so - no cell iterators, of course.

get_by (row+cf+qualifier) -> last version of. All other types of queries should bypass
K-V cache.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Matt Corgan [mcorgan@hotpads.com]
Sent: Wednesday, April 04, 2012 2:46 PM
To: dev@hbase.apache.org
Subject: Re: keyvalue cache

It could act like a HashSet of KeyValues keyed on the
rowKey+family+qualifier but not including the timestamp.  As writes come in
it would evict or overwrite previous versions (read-through vs
write-through).  It would only service point queries where the
row+fam+qualifier are specified, returning the latest version.  Wouldn't be
able to do a typical rowKey-only Get (scan behind the scenes) because it
wouldn't know if it contained all the cells in the row, but if you could
specify all your row's qualifiers up-front it could work.


On Wed, Apr 4, 2012 at 2:30 PM, Vladimir Rodionov
<vrodionov@carrieriq.com>wrote:

> 1. 2KB can be too large for some applications. For example, some of our
> k-v sizes < 100 bytes combined.
> 2. These tables (from 1.) do not benefit from block cache at all (we did
> not try 100 B block size yet :)
> 3. And Matt is absolutely right: small block size is expensive
>
> How about doing point queries on K-V cache and  bypass K-V cache on all
> Scans (when someone really need this)?
> Implement K-V cache as a coprocessor application?
>
> Invalidation of K-V entry is not necessary if all upserts operations go
> through K-V cache firstly if it sits in front of MemStore.
> There will be no "stale or invalid" data situation in this case. Correct?
> No need for data to be sorted and no need for data to be merged
> into a scan (we do not use K-V cache for Scans)
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Matt Corgan [mcorgan@hotpads.com]
> Sent: Wednesday, April 04, 2012 11:40 AM
> To: dev@hbase.apache.org
> Subject: Re: keyvalue cache
>
> I guess the benefit of the KV cache is that you are not holding entire 64K
> blocks in memory when you only care about 200 bytes of them.  Would an
> alternative be to set a small block size (2KB or less)?
>
> The problems with small block sizes would be expensive block cache
> management overhead and inefficient scanning IO due to lack of read-ahead.
>  Maybe improving the cache management and read-ahead would be more general
> improvements that don't add as much complexity?
>
> I'm having a hard time envisioning how you would do invalidations on the KV
> cache and how you would merge its entries into a scan, etc.  Would it
> basically be a memstore in front of the memstore where KVs get individually
> invalidated instead of bulk-flushed?  Would it be sorted or hashed?
>
> Matt
>
> On Wed, Apr 4, 2012 at 10:35 AM, Enis Söztutar <enis@apache.org> wrote:
>
> > As you said, caching the entire row does not make much sense, given that
> > the families are by contract the access boundaries. But caching column
> > families might be a good trade of for dealing with the per-item overhead.
> >
> > Also agreed on cache being configurable at the table or better cf level.
> I
> > think we can do something like enable_block_cache = true,
> > enable_kv_cache=false, per column family.
> >
> > Enis
> >
> > On Tue, Apr 3, 2012 at 11:03 PM, Vladimir Rodionov
> > <vrodionov@carrieriq.com>wrote:
> >
> > > Usually make sense for tables with random mostly access (point queries)
> > > For short-long scans block cache is preferable.
> > > Cassandra has it (Row cache) but as since they cache the whole row
> (which
> > > can be very large) in many cases
> > > it has sub par performance. Make sense to make caching configurable:
> > table
> > > can use key-value cache and do not use block cache
> > > and vice verse.
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > ________________________________________
> > > From: Enis Söztutar [enis@apache.org]
> > > Sent: Tuesday, April 03, 2012 3:34 PM
> > > To: dev@hbase.apache.org
> > > Subject: keyvalue cache
> > >
> > > Hi,
> > >
> > > Before opening the issue, I though I should ask around first. What do
> you
> > > think about a keyvalue cache sitting on top of the block cache? It is
> > > mentioned in the big table paper, and it seems that zipfian kv access
> > > patterns might benefit from something like this a lot. I could not find
> > > anybody who proposed that before.
> > >
> > > What do you guys think? Should we pursue a kv query-cache. My gut
> feeling
> > > says that especially for some workloads we might gain significant
> > > performance improvements, but we cannot verify it, until we implement
> and
> > > profile it, right?
> > >
> > > Thanks,
> > > Enis
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
> > >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

Mime
View raw message