lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary
Date Wed, 14 May 2008 13:45:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596762#action_12596762
] 

Paul Elschot commented on LUCENE-1278:
--------------------------------------

Some comments on the 5.7.2008 patch:

The test with 7.6 times speedup for very few docs per term makes me wonder why this never
showed up as a performance problem before. It certainly shows an advantage of flexible indexing
for the case in which the within document term frequencies are not needed (for example primary/foreign
keys, which normally end up in a keyword field.)

In the patch, DocIdSetIterator is used in the org.apache.lucene.index package, so it would
be a good idea to move it from o.a.l.search to o.a.l.index or to o.a.l.util to avoid a circular
dependency involving the index and search packages. As DocIdSetIterator is not yet released,
this move should be no problem.

The DocIdSetReader class in the patch has so much code in common with SortedVIntList that
it might be better to merge the two into a single one, and try and refactor common code into
new methods there.
That would also be an easy way to get rid of the unsupported skipTo() operation.



> Add optional storing of document numbers in term dictionary
> -----------------------------------------------------------
>
>                 Key: LUCENE-1278
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1278
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: lucene.1278.5.4.2008.patch, lucene.1278.5.5.2008.2.patch, lucene.1278.5.5.2008.patch,
lucene.1278.5.7.2008.patch, lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java
>
>
> Add optional storing of document numbers in term dictionary.  String index field cache
and range filter creation will be faster.  
> Example read code:
> {noformat}
> TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
> do {
>   Term term = termEnum.term();
>   if (term == null || term.field() != field) break;
>   int[] docs = termEnum.docs();
> } while (termEnum.next());
> {noformat}
> Example write code:
> {noformat}
> Document document = new Document();
> document.add(new Field("tag", "dog", Field.Store.YES, Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
> indexWriter.addDocument(document);
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message