incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron McCurry (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BLUR-220) Support for humongous Rows
Date Wed, 16 Oct 2013 10:12:43 GMT

    [ https://issues.apache.org/jira/browse/BLUR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796630#comment-13796630
] 

Aaron McCurry commented on BLUR-220:
------------------------------------

Ravi,

I have been taking some time to digest your results.  I believe that in most cases anything
less 50 ms will be acceptable.  However to have this feature work at the scale that Blur can
currently operate at we are going to have to have a different approach than just the plain
mixed index, obviously.

I think that we are going to have to have some sort of mixed approach where the index is in
a normal state and another where is it in this dual pass mode.  The biggest problem I see
to overcome with this approach is how to get the entire row back together again during merges,
when the row is spread across segments and we don't want to have to do a full optimization
(1 segment).

I will do some more thinking on this one.

Aaron

> Support for humongous Rows
> --------------------------
>
>                 Key: BLUR-220
>                 URL: https://issues.apache.org/jira/browse/BLUR-220
>             Project: Apache Blur
>          Issue Type: Improvement
>          Components: Blur
>    Affects Versions: 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
>
>         Attachments: Blur_Query_Perf_Chart1.pdf, CreateIndex.java, CreateIndex.java,
CreateSortedIndex.java, MyEarlyTerminatingCollector.java, test_results.txt, TestSearch.java,
TestSearch.java
>
>
> One of the limitations of Blur is size of Rows stored, specifically the number of Records.
 The current updates are performed on Lucene is by deleting the document and re-adding to
the index.  Unfortunately when any update is perform on a Row in Blur, the entire Row has
to be re-read (if the RowMutationType is UPDATE_ROW) and then whatever modification needs
are made then it is reindexed in it's entirety.
> Due to all of this overhead, there is a realistic limit on the size of a given Row. 
It may vary based the kind of hardware that is being used, as the Row grows in size the indexing
(mutations) against that Row will slow.
> This issue is being created to discuss techniques on how to deal with this problem.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message