incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravikumar (JIRA)" <>
Subject [jira] [Updated] (BLUR-220) Support for humongous Rows
Date Thu, 24 Oct 2013 13:52:02 GMT


Ravikumar updated BLUR-220:


Ok, somethings that I gathered.

If we are to use a RAM-based directory, then it's definitely not going to be a RAMDirectory.
Even the javadocs has warnings!!!

I quickly grabbed the SlabAllocator from Cassandra [Which is again grabbed from HBase], that
doles out 1 MB byte[] and wrap it up with lucene's BytesRef. Each RAMFile contains N-chunks
of BytesRef with chunk-size=64KB. 

I believe it should be both friendly on GC-cycles for few GB's of RAM as well as quite performant
under concurrent loads. Patch attached.

In Blur code, I see everywhere a "waitTobeVisible" flag, instructing NRTManager to wait till
that generation. How should I understand that in the context of a RAMDirectory, backed by
a HDFSDirectory? What should be the correct way to approach this?

> Support for humongous Rows
> --------------------------
>                 Key: BLUR-220
>                 URL:
>             Project: Apache Blur
>          Issue Type: Improvement
>          Components: Blur
>    Affects Versions: 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
>         Attachments: Blur_Query_Perf_Chart1.pdf,,,,,,,,,,,
> One of the limitations of Blur is size of Rows stored, specifically the number of Records.
 The current updates are performed on Lucene is by deleting the document and re-adding to
the index.  Unfortunately when any update is perform on a Row in Blur, the entire Row has
to be re-read (if the RowMutationType is UPDATE_ROW) and then whatever modification needs
are made then it is reindexed in it's entirety.
> Due to all of this overhead, there is a realistic limit on the size of a given Row. 
It may vary based the kind of hardware that is being used, as the Row grows in size the indexing
(mutations) against that Row will slow.
> This issue is being created to discuss techniques on how to deal with this problem.

This message was sent by Atlassian JIRA

View raw message