lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <>
Subject [jira] Commented: (LUCENE-1313) Realtime Search
Date Mon, 04 May 2009 18:31:30 GMT


Jason Rutherglen commented on LUCENE-1313:

{quote}I don't like how "deep" the dichotomy of "RAMDir vs
FSDir" {quote}

Agreed, it's a bit awkward but I don't see another way to do
this. The good thing is if IW has written some .fdt files to the
main dir (via FSD), IW crashes, then IW is created again, IFD
automatically deletes the extraneous .fdt (and other extension)

{quote}Why can't we push FSD down to all these places (IFD,
SegmentInfo/s, etc.)?{quote}

{quote}Could we simply make the single CMS instance smart enough
to realize that a single RAM merge is allowed to proceed
regardless of the thread limit?{quote}

Hmm... I think for benchmarking it would be good to allow
options as we simply don't know. In the latest patch a ram
mergescheduler can be set to the IndexWriter.

{quote}have to fix FSD to understand CFX must go to the dir

I think this is fixed in the patch, where compound files are not
created in RAM. {quote}

You're saying we should have IW create the ramdir by default
after getReader is called and remove the IW ramdir constructor?
Right. This should be "under the hood".{quote}

Ok, this will require some reworking of the patch. 

{quote}OK, though I'd like to simply always use FSD, even if
primary & secondary are the same dir. {quote}

How will always using FSD work? Doesn't it assume writing to two
different directories?

{quote}this ram size should be used not only for deciding when
it's time to merge to a disk segment, but also when it's time
for DW to flush a new segment{quote}

In the new patch this is fixed.

{quote}So if budget is 32 MB, and net RAM used (segments + DW)
is say 22, we have a 10 MB "budget", so we are allowed to select
merges that total to < 10 MB.{quote}

One issue is the ram buffer flush doubles the ram used (because
the segment is flushed as is to the RAM dir). You're saying
roughly estimate the ram size used on the result of a merge and
have the merge policy take this into account? This makes sense,
otherwise we will consistently (if temporarily) exceed the ram
buffer size. The algorithm is fairly simple? Find segments whose
total sizes are lower than whatever we have left of the max ram
buffer size? I have new code, but will rework it a bit to
include this discussion. 

> Realtime Search
> ---------------
>                 Key: LUCENE-1313
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch,
LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch,
lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
> Realtime search with transactional semantics.  
> Possible future directions:
>   * Optimistic concurrency
>   * Replication
> Encoding each transaction into a set of bytes by writing to a RAMDirectory enables replication.
 It is difficult to replicate using other methods because while the document may easily be
serialized, the analyzer cannot.
> I think this issue can hold realtime benchmarks which include indexing and searching

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message