incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron McCurry (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BLUR-220) Support for humongous Rows
Date Sat, 19 Oct 2013 14:15:42 GMT

    [ https://issues.apache.org/jira/browse/BLUR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799908#comment-13799908
] 

Aaron McCurry commented on BLUR-220:
------------------------------------

Yep, I agree this is likely the next logical evolution of the Blur NRT system.  I would love
to make things more like a database and remove the variable length delay before changes to
an index is visible.

This will likely turn into a large rewrite or new implementation of the BlurIndex class (abstract).
 But the nice thing is that this implementation can be isolated to the BlurIndex.

Btw, glad to have your help on this.  :-)

Aaron

> Support for humongous Rows
> --------------------------
>
>                 Key: BLUR-220
>                 URL: https://issues.apache.org/jira/browse/BLUR-220
>             Project: Apache Blur
>          Issue Type: Improvement
>          Components: Blur
>    Affects Versions: 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
>
>         Attachments: Blur_Query_Perf_Chart1.pdf, CreateIndex.java, CreateIndex.java,
CreateSortedIndex.java, FullRowReindexing.java, MyEarlyTerminatingCollector.java, test_results.txt,
TestSearch.java, TestSearch.java
>
>
> One of the limitations of Blur is size of Rows stored, specifically the number of Records.
 The current updates are performed on Lucene is by deleting the document and re-adding to
the index.  Unfortunately when any update is perform on a Row in Blur, the entire Row has
to be re-read (if the RowMutationType is UPDATE_ROW) and then whatever modification needs
are made then it is reindexed in it's entirety.
> Due to all of this overhead, there is a realistic limit on the size of a given Row. 
It may vary based the kind of hardware that is being used, as the Row grows in size the indexing
(mutations) against that Row will slow.
> This issue is being created to discuss techniques on how to deal with this problem.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message