lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1278) Add optional storing of document numbers in term dictionary
Date Mon, 05 May 2008 17:31:56 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated LUCENE-1278:
-------------------------------------

    Attachment: TestTermEnumDocs.java

Was going to write a Lucene test case but need an example and svn is down.

The example test is extremely poor because the term and field saturation is nil.  Normal documents
will have far more terms and the file cache will not have cached as much of the term docs
as it will be larger.  However it does illustrate the speed up.  Please suggest other tests.

Laptop Windows XP SP2 Java6 core2duo, about the same on 3 separate runs:
3360 millis termenum loaddocs
25641 millis termdocs
7.6 times speedup

There have been previous discussions regarding the speed issue.  
http://www.gossamer-threads.com/lists/lucene/java-dev/53786
The conclusion was to use payloads which do not speed up stringindex or range queries.  


> Add optional storing of document numbers in term dictionary
> -----------------------------------------------------------
>
>                 Key: LUCENE-1278
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1278
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: lucene.1278.5.4.2008.patch, lucene.1278.5.5.2008.2.patch, lucene.1278.5.5.2008.patch,
TestTermEnumDocs.java
>
>
> Add optional storing of document numbers in term dictionary.  String index field cache
and range filter creation will be faster.  
> Example read code:
> {noformat}
> TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
> do {
>   Term term = termEnum.term();
>   if (term == null || term.field() != field) break;
>   int[] docs = termEnum.docs();
> } while (termEnum.next());
> {noformat}
> Example write code:
> {noformat}
> Document document = new Document();
> document.add(new Field("tag", "dog", Field.Store.YES, Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
> indexWriter.addDocument(document);
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message