lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Smiley, David W." <dsmi...@mitre.org>
Subject Re: A fast way to get real docID from large indexes?
Date Wed, 12 Dec 2012 22:19:28 GMT
I suggest you load your unique key field into memory via the FieldCache,
then reference it that way.  See LUCENE-4541 for a "ValueSourceAccessor"
proposal.  There are FieldCache based ValueSources.

~ David Smiley

On 12/12/12 5:00 PM, "Bissan AUDEH" <audeh@emse.fr> wrote:

> Thank you Carsten,
>What I mean by document real name is any stored field in the index that
>represents the document (ex:Document title, document file name in the
>file system, document location,...), or anything that you stored as a
>field at index time and you which to present to the user as  search
>result, because presenting the LuceneDocID means nothing to the user.
>
>What I'm doing actually is something like this :
>
>IndexSearcher searcher;
>TopDocs results =  searcher.search(query, numTotalHits);
>ScoreDoc[] hits = results.scoreDocs;
>for (int  i = 0; i < numTotalHits; i++)
>{
>   doc = searcher.doc(hits[i].doc);
>   System.out.println( hits[i].doc + " : " + hits[i].score);
>}
>
>unless I'm doing it wrong, the instruction "searcher.doc(hits[i].doc);"
>seems to be time consuming for large indexes.
>
>I'll take a look at AllDocCollector that you mentioned in your mail
>hoping it will resolve my problem.
> 
>Le Mercredi 12 Décembre 2012 13:30 CET, Carsten Schnober
><schnober@ids-mannheim.de> a écrit:
> 
>> Am 07.12.2012 15:12, schrieb Bissan Audeh:
>> 
>> > I'm doing some experiments with Lucene where I run many queries and I
>>keep top 1500  results of each query. I recently switched to Lucene4.0,
>>but in all cases I find that it takes a lot of time to get the REAL
>>document id using ScoreDoc and IndexSearcher especially that I have very
>>large indexes.
>> > Does anyone know a faster way?
>> > It would be more efficient to have the document real name as an
>>attribute of the class ScoreDoc in addition to its luceneID and its
>>score, because in all cases this information is always needed to show
>>retrieved documents.
>> 
>> 
>> By "real" name, do you mean something like the input document title as
>> opposed to the id assigned by Lucene during indexing? I've resolved this
>> by storing document name in a dedicated field so that I can use it in a
>> query or filter.
>> If you refer to the Lucene index ids, you might be interested in using a
>> Collector; the example "AllDocCollector" given in the textbook "Lucene
>> in Action" (McCandless, Hatcher, Gospodnetić, 2nd ed., ch. 6) is
>> probably helpful.
>> Best,
>> Carsten
>> 
>> -- 
>> Institut für Deutsche Sprache | http://www.ids-mannheim.de
>> Projekt KorAP                 | http://korap.ids-mannheim.de
>> Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
>> Korpusanalyseplattform der nächsten Generation
>> Next Generation Corpus Analysis Platform
> 
> 
> 
> 
>


Mime
View raw message