lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary
Date Mon, 21 Jul 2008 22:11:43 GMT
This also reminds me of the "pulsing" technique described in:

http://citeseer.ist.psu.edu/cutting90optimizations.html

Doug

eks dev wrote:
> It seams someone else had the same idea to "inline" very short postings into term dictionary
(even for in-memory index) ans save one pointer (and seek, in disk setup)... nice reading
> 
> http://www.siam.org/proceedings/alenex/2008/alx08_01transierf.pdf
> 
> 
> 
> 
> ----- Original Message ----
>> From: Eks Dev (JIRA) <jira@apache.org>
>> To: java-dev@lucene.apache.org
>> Sent: Sunday, 20 July, 2008 1:02:31 PM
>> Subject: [jira] Commented: (LUCENE-1278) Add optional storing of document numbers
in term dictionary
>>
>>
>>     [ 
>> https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077

>> ] 
>>
>> Eks Dev commented on LUCENE-1278:
>> ---------------------------------
>>
>> in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I 
>> think it is worth mentioning that I am working on LUCENE-1340, that is storing 
>> postings without additional frq info. 
>>
>> correct me if I am wrong, the only difference is that this approach with *.frq 
>> needs one seek more... at the same time, this could potentially increase term 
>> dict size, so we loose some locality.
>>
>> Your your last proposal sounds interesting,  "inline short postings" into term 
>> dict , so for short postings (about the size of offset pointer into *.frq) with 
>> tf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340)  we

>> spare one seek()... this could be a lot. Also, there is no need to store 
>> postings into *frq  (this complicates maintenance I guess)  
>>
>>> Add optional storing of document numbers in term dictionary
>>> -----------------------------------------------------------
>>>
>>>                 Key: LUCENE-1278
>>>                 URL: https://issues.apache.org/jira/browse/LUCENE-1278
>>>             Project: Lucene - Java
>>>          Issue Type: New Feature
>>>          Components: Index
>>>    Affects Versions: 2.3.1
>>>            Reporter: Jason Rutherglen
>>>            Priority: Minor
>>>         Attachments: lucene.1278.5.4.2008.patch, lucene.1278.5.5.2008.2.patch,

>> lucene.1278.5.5.2008.patch, lucene.1278.5.7.2008.patch, 
>> lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java
>>>
>>> Add optional storing of document numbers in term dictionary.  String index 
>> field cache and range filter creation will be faster.  
>>> Example read code:
>>> {noformat}
>>> TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
>>> do {
>>>   Term term = termEnum.term();
>>>   if (term == null || term.field() != field) break;
>>>   int[] docs = termEnum.docs();
>>> } while (termEnum.next());
>>> {noformat}
>>> Example write code:
>>> {noformat}
>>> Document document = new Document();
>>> document.add(new Field("tag", "dog", Field.Store.YES, 
>> Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
>>> indexWriter.addDocument(document);
>>> {noformat}
>> -- 
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
>       __________________________________________________________
> Not happy with your email address?.
> Get the one you really want - millions of new email addresses available now at Yahoo!
http://uk.docs.yahoo.com/ymail/new.html
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message