lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eks Dev (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary
Date Sun, 20 Jul 2008 11:02:31 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077
] 

Eks Dev commented on LUCENE-1278:
---------------------------------

in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I think it is
worth mentioning that I am working on LUCENE-1340, that is storing postings without additional
frq info. 

correct me if I am wrong, the only difference is that this approach with *.frq needs one seek
more... at the same time, this could potentially increase term dict size, so we loose some
locality.

Your your last proposal sounds interesting,  "inline short postings" into term dict , so for
short postings (about the size of offset pointer into *.frq) with tf==1 (that is the always
the case if you use omitTf(true) from LUCENE-1340)  we spare one seek()... this could be a
lot. Also, there is no need to store postings into *frq  (this complicates maintenance I guess)
 

> Add optional storing of document numbers in term dictionary
> -----------------------------------------------------------
>
>                 Key: LUCENE-1278
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1278
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: lucene.1278.5.4.2008.patch, lucene.1278.5.5.2008.2.patch, lucene.1278.5.5.2008.patch,
lucene.1278.5.7.2008.patch, lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java
>
>
> Add optional storing of document numbers in term dictionary.  String index field cache
and range filter creation will be faster.  
> Example read code:
> {noformat}
> TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
> do {
>   Term term = termEnum.term();
>   if (term == null || term.field() != field) break;
>   int[] docs = termEnum.docs();
> } while (termEnum.next());
> {noformat}
> Example write code:
> {noformat}
> Document document = new Document();
> document.add(new Field("tag", "dog", Field.Store.YES, Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
> indexWriter.addDocument(document);
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message