incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravikumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BLUR-290) NRT Updates using RAMDirectory & Swap
Date Mon, 11 Nov 2013 12:49:18 GMT

    [ https://issues.apache.org/jira/browse/BLUR-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818923#comment-13818923
] 

Ravikumar commented on BLUR-290:
--------------------------------

On the read-path, I saw current-code in SuperQuery.java where scoring is done per-row as per
PrimeDocCache BitSet and is wrapped with a TopScoreDocsCollector in IterablePaging.java. Please
correct if I am wrong

Situation is slightly different here. 

1. Same Row is spread-out across segments contiguously in disk index.

2. Same Row can also be scattered across segments non-contiguous in "N" RAM indexes.

If there is a query with "docs.content=hello AND rowid:123" etc..., then this will be a straight-forward
impl.

But if there is a query just with "docs.content=hello", then this is going to be very difficult
to aggregate all records across segments for a given row and compute a correct score.

I can think of the newly introduced Grouping functionality in lucene, where we can group by
"rowid" but that is extremely costly

1. It involves FieldCache
2. There are 2 round-trips, one for identifying Top "N" rows & another for identifying
Top "M" records for each of the "N" rows

Need some help here.

Or may be for a start, we can also choose to not support queries without a "row-id", when
using this real-time system. [Something akin to key-value store, where query without a key
is not possible] 

> NRT Updates using RAMDirectory & Swap
> -------------------------------------
>
>                 Key: BLUR-290
>                 URL: https://issues.apache.org/jira/browse/BLUR-290
>             Project: Apache Blur
>          Issue Type: New Feature
>    Affects Versions: experimental-dev
>            Reporter: Ravikumar
>         Attachments: BlurFlushingIndexWriter.java, BlurIndexTracker.java, BlurRealTimeIndex.java,
BlurRealTimeIndexWriter.java, BlurRealTimeManager.java, BlurRealTimeManagerReopenThread.java,
RealTimeTransactionRecorder.java, SlabAllocator.java, SlabRAMDirectory.java, SlabRAMFile.java,
SlabRAMInputStream.java, SlabRAMOutputStream.java, SortingMultiReader.java
>
>
> We have been discussing about handling humungous rows in Blur (BLUR-220). Explore the
idea of using RAMDirectory at the front, backed by persistent-index.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message