hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Predictive Caching
Date Thu, 23 Apr 2015 21:43:35 GMT

You don’t want to do it. (Think about what you’re asking for …) 

You would be better off w secondary indexing so that you can hit your index to get your subset
of rows and then use the map/reduce to process the result set. 

> On Apr 23, 2015, at 2:18 PM, ayyajnam nahdravhbuhs <ayyajnam@gmail.com> wrote:
> Hi,
> I have been toying with the idea of a predictive cache for Batch Hbase jobs.
> Traditionally speaking, hadoop is a batch processing framework. We use
> hbase as a data store for a number of batch jobs that run on Hadoop.
> Depending on the job that is run, and the way the data is layed out, Hbase
> might perform great for some of the jobs but might result in performance
> bottlenecks for others. This might specifically be seen for cases where the
> same table is used as an input for different jobs with different access
> patterns.
> Hbase currently supports various cache implementations (Bucket, LRU,
> Combined) but none of these mechanisms are job aware. A job aware cache
> should be able to determine the best data to cache based on previous data
> requests from previous runs of the job. The learning process can happen in
> the background and will require access information from mulitple runs of
> the job. The process should result in a per job output that can be used by
> a new Predictive caching algorithm. When a job is then run with this
> predictive cache, it can query the learning results when it has to decide
> which block to evict or load.
> Just wanted to check if anyone knows of any related work in this area.
> Thoughts and suggestions welcome.
> Thanks,
> Ayya

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

View raw message