hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pradeep Gollakota <pradeep...@gmail.com>
Subject Client API best practices for my use case
Date Wed, 11 Dec 2013 01:19:42 GMT
Hi All,

I'm trying to understand how different configuration will affect
performance for my use cases. My table has the following the following
schema. I'm storing event logs in a single column family. The row key is in
the format [company][timestamp][uuid].

My access pattern is fairly simple. Every X retrieve the last X worth of
events. The X is typically small... e.g. Every min give me the last min of
events or every hour give me the last hour of events. Occasionally, I might
request historical data, e.g. Give me all events from August 2012. I need
the queries requesting the most recent data to be really fast and am ok
with the historical queries being slow.

The configuration options I'm interested in are: scanner-caching and
block-cache usage. I noticed in the Java api to create column families that
there is an option to "setCacheDataOnWrite". What does this do exactly?
It's also recommended that for sequential queries, the blockCache on scan
be disabled. How does scanner caching work? Is this per Scan or is it a
shared cache? Does scanner caching use the same cache as the block cache?
If I have multiple Scan's with caching enabled AND it's a shared cache how
does eviction work? Ideally I always want the most recently written data to
be in the cache with as few cache evictions as possible.

For my use case, if I want the best performance to be on the most recent
events, what configuration of block cache and scanner caching should I use?

Thanks in advance.
- Pradeep

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message