lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary
Date Mon, 05 May 2008 13:03:55 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594231#action_12594231
] 

Jason Rutherglen commented on LUCENE-1278:
------------------------------------------

Storing the docs is off by default and will add index size only if the user wishes.  The byte
blob allows not reading the docs when loaddocs is false.  Field cache and range query loading
is very slow because of the dual seeks per term (for termenum then termdocs).  If in a separate
file the terms are redundant.  

An field cache example:

protected Object createValue(IndexReader reader, Object entryKey)
        throws IOException {
      Entry entry = (Entry) entryKey;
      String field = entry.field;
      IntParser parser = (IntParser) entry.custom;
      final int[] retArray = new int[reader.maxDoc()];
      // TermDocs termDocs = reader.termDocs();  
      //TermEnum termEnum = reader.terms (new Term (field, ""));
      TermEnum termEnum = reader.terms (new Term (field, ""), true);
      try {
        do {
          Term term = termEnum.term();
          if (term==null || term.field() != field) break;
          int termval = parser.parseInt(term.text());
          int[] docs = termEnum.docs();
          for (int x=0; x < docs.length; x++) {
            retArray[docs[x]] = termval;
          }
          //termDocs.seek (termEnum);
          //while (termDocs.next()) {
          //  retArray[termDocs.doc()] = termval;
          //}
        } while (termEnum.next());
      } finally {
        //termDocs.close();
        termEnum.close();
      }
      return retArray;
    }

> Add optional storing of document numbers in term dictionary
> -----------------------------------------------------------
>
>                 Key: LUCENE-1278
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1278
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: lucene.1278.5.4.2008.patch, lucene.1278.5.5.2008.2.patch, lucene.1278.5.5.2008.patch
>
>
> Add optional storing of document numbers in term dictionary.  String index field cache
and range filter creation will be faster.  
> Example read code:
> {noformat}
> TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
> do {
>   Term term = termEnum.term();
>   if (term == null || term.field() != field) break;
>   int[] docs = termEnum.docs();
> } while (termEnum.next());
> {noformat}
> Example write code:
> {noformat}
> Document document = new Document();
> document.add(new Field("tag", "dog", Field.Store.YES, Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
> indexWriter.addDocument(document);
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message