lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
Date Mon, 15 Mar 2010 16:57:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845404#action_12845404
] 

Jason Rutherglen commented on LUCENE-2312:
------------------------------------------

Pre-advanced apology for permanently damaging (well I guess it
can be deleted) the look and feel of this issue with a thwack of
code, however I don't want to post the messy patch, and I'm
guessing there's something small as to why the postings
iteration on the freq byte slice reader isn't happening
correctly (ie, it's returning 0).

{code}
public class DWTermDocs implements TermDocs {
    final FreqProxTermsWriterPerField field;
    final int numPostings;
    final CharBlockPool charPool;
    FreqProxTermsWriter.PostingList posting;
    char[] text;
    int textOffset;
    private int postingUpto = -1;
    final ByteSliceReader freq = new ByteSliceReader();
    final ByteSliceReader prox = new ByteSliceReader();

    int docID;
    int termFreq;
    
    DWTermDocs(FreqProxTermsWriterPerField field, FreqProxTermsWriter.PostingList posting)
throws IOException {
      this.field = field;
      this.charPool = field.perThread.termsHashPerThread.charPool;
      //this.numPostings = field.termsHashPerField.numPostings;
      this.numPostings = 1;
      this.posting = posting;
      // nextTerm is called only once to 
      // set the term docs pointer at the 
      // correct position
      nextTerm();
    }
    
    boolean nextTerm() throws IOException {
      postingUpto++;
      if (postingUpto == numPostings)
        return false;

      docID = 0;

      text = charPool.buffers[posting.textStart >> DocumentsWriter.CHAR_BLOCK_SHIFT];
      textOffset = posting.textStart & DocumentsWriter.CHAR_BLOCK_MASK;

      field.termsHashPerField.initReader(freq, posting, 0);
      if (!field.fieldInfo.omitTermFreqAndPositions)
        field.termsHashPerField.initReader(prox, posting, 1);

      // Should always be true
      boolean result = nextDoc();
      assert result;

      return true;
    }
    
    public boolean nextDoc() throws IOException {
      if (freq.eof()) {
        if (posting.lastDocCode != -1) {
          // Return last doc
          docID = posting.lastDocID;
          if (!field.omitTermFreqAndPositions)
            termFreq = posting.docFreq;
          posting.lastDocCode = -1;
          return true;
        } else
          // EOF
          return false;
      }
      final int code = freq.readVInt();
      if (field.omitTermFreqAndPositions)
        docID += code;
      else {
        docID += code >>> 1;
        if ((code & 1) != 0)
          termFreq = 1;
        else
          termFreq = freq.readVInt();
      }
      assert docID != posting.lastDocID;
      return true;
    }
{code}

> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.1
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max doc ids.
 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message