incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron McCurry (JIRA)" <>
Subject [jira] [Commented] (BLUR-290) NRT Updates using RAMDirectory & Swap
Date Tue, 07 Jan 2014 13:11:51 GMT


Aaron McCurry commented on BLUR-290:


Based on our discussion here I started investigating why the hdfs directory was so slow. 
During some micro benchmarks I timed a single document update commit on a RAMDirectory to
be around 1ms (precommit phase to be 0.6 ms and the final commit phase to be 0.08 ms).  The
same test run with the HdfsDirectory was between 160ms - 200ms for the commit.  After some
more investigation I found of the slowness issues were due to Hdfs meta data calls, FileStatus
calls.  Also I found that caching the InputStreams to files helped a lot as well.

The other thing I did was to create a embedded key value store that stored data to hdfs (HdfsKeyValueDirectory).
 After creating that I wrote a directory to make use of it (FastHdfsKeyValueDirectory).  Then
other directory (JoinDirectory) to use both the classic HdfsDirectory for large term files
and the FastHdfsKeyValueDirectory for short term files i.e. NRT updates.

The end result is a commit time in the 1-2ms range for the micro benchmark.  In Blur now there's
no need for the WAL because everything is committed to disk on each mutate and the overall
NRT update throughput has greatly increased.

This doesn't solve the huge row problem, so that's next on the list.  :-)


> NRT Updates using RAMDirectory & Swap
> -------------------------------------
>                 Key: BLUR-290
>                 URL:
>             Project: Apache Blur
>          Issue Type: New Feature
>    Affects Versions: experimental-dev
>            Reporter: Ravikumar
>         Attachments:,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
> We have been discussing about handling humungous rows in Blur (BLUR-220). Explore the
idea of using RAMDirectory at the front, backed by persistent-index.

This message was sent by Atlassian JIRA

View raw message