hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: SILT - nice keyvalue store paper
Date Sat, 22 Oct 2011 21:33:37 GMT
On Sat, Oct 22, 2011 at 1:17 AM, Dhruba Borthakur <dhruba@gmail.com> wrote:
> One of the current problems with hbase eating lots of cpu is the fact that
> the memstore in a sortedset. Instead, we can make it a hashset so that
> lookup and insertions are much faster. At the time of flushing, we can sort
> the snapshot memstore and write it out to hdfs. This will decrease latencies
> of Puts to a great extent. I will experiment on how this will fare with
> real-life workload.

How would you scan in order an HashSet?  Deletes across spans
(families or all older than a particular timestamp)?

On the other hand, I did have a chat w/ the LMAX folks recently.  They
had made a point earlier in the day that java Collections and
Concurrent are well due an overhaul (as an aside in a talk whose
general thrust was revisit all assumptions especially atop modern
hardware).  I was asking what the underpinnings of a modern Collection
might look like and in particular described our issue with
ConcurrentSkipListMap.  One of the boys threw out the notion of
catching the inserts in their Disruptor data structure and then
sorting the data structure.  Seemed a bit of a silly suggestion at the
time but perhaps if we added MVCC to the mix and the read point moved
on after the completion of a sort.  We'd be juggling lots of sorted
lists in memory....

> I have also been playing around with a 5 node test cluster that has
> flashdrives. The flash drives are mounted as xfs filesystems.
> A LookasideCacheFileSystem (http://bit.ly/pnGju0) that is a client side
> layered filter driver on top of hdfs. When HBase flushes data to HDFS, it is
> cached transparently in the LookasideCacheFileSystem.
> The LookasideCacheFileSystem uses the flash drive as a cache. The assumption
> here is that recently flushed hfiles are more likely to be accessed than the
> data in HFiles that were flushed earlier (not yet messed with major
> compactions). I will be measuring the performance benefit of this
> configuration.

That sounds sweet Dhruba.

View raw message