lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <schno...@ids-mannheim.de>
Subject Re: A fast way to get real docID from large indexes?
Date Wed, 12 Dec 2012 12:30:05 GMT
Am 07.12.2012 15:12, schrieb Bissan Audeh:

> I'm doing some experiments with Lucene where I run many queries and I keep top 1500 
results of each query. I recently switched to Lucene4.0, but in all cases I find that it takes
a lot of time to get the REAL document id using ScoreDoc and IndexSearcher especially that
I have very large indexes.
> Does anyone know a faster way?
> It would be more efficient to have the document real name as an attribute of the class
ScoreDoc in addition to its luceneID and its score, because in all cases this information
is always needed to show retrieved documents.


By "real" name, do you mean something like the input document title as
opposed to the id assigned by Lucene during indexing? I've resolved this
by storing document name in a dedicated field so that I can use it in a
query or filter.
If you refer to the Lucene index ids, you might be interested in using a
Collector; the example "AllDocCollector" given in the textbook "Lucene
in Action" (McCandless, Hatcher, Gospodnetić, 2nd ed., ch. 6) is
probably helpful.
Best,
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

Mime
View raw message