hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject For HBase compactions - Lucene's IO impact reduction code
Date Sat, 07 Jul 2012 05:49:49 GMT

Here is something that may be of interest to HBase:

Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene developers,
wrote a really nice post about new things in this version of Lucene.  The part that I think
is interesting for HBase, and that HBase devs may want to look at (and borrow to use with
compactions) is this:

Reducing merge IO impact 

Merging (consolidating many small segments into a single big one) is a very IO and CPU intensive
operation which can easily interfere with ongoing searches. In 4.0.0 we now have two ways
to reduct this impact:
	* Rate-limit the IO caused by ongoing merging, by callingFSDirectory.setMaxMergeWriteMBPerSec. 

	* Use the new NativeUnixDirectory which bypasses the OS's IO cache for all merge IO,
by using direct IO. This ensures that a merge won't evict hot pages used by searches. (Note
that there is also a native WindowsDirectory, but it does not yet use direct IO during merging...
patches welcome!). 

Remember to also set swappiness to 0 on Linux if you want to maximize search responsiveness. 

More generally, the APIs that open an input or output file (Directory.openInput andDirectory.createOutput)
now take an IOContext describing what's being done (e.g., flush vs merge), so you can create
a custom Directory that changes its behavior depending on the context. 

These changes were part of a 2011 Google Summer of Code project (thank you Varun!).  



Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm 

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message