lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bissan AUDEH" <au...@emse.fr>
Subject Re: A fast way to get real docID from large indexes?
Date Wed, 12 Dec 2012 22:00:06 GMT
 Thank you Carsten,
What I mean by document real name is any stored field in the index that represents the document
(ex:Document title, document file name in the file system, document location,...), or anything
that you stored as a field at index time and you which to present to the user as  search result,
because presenting the LuceneDocID means nothing to the user.

What I'm doing actually is something like this : 

IndexSearcher searcher;
TopDocs results =  searcher.search(query, numTotalHits);
ScoreDoc[] hits = results.scoreDocs;
for (int  i = 0; i < numTotalHits; i++) 
{
   doc = searcher.doc(hits[i].doc);
   System.out.println( hits[i].doc + " : " + hits[i].score);
}

unless I'm doing it wrong, the instruction "searcher.doc(hits[i].doc);" seems to be time consuming
for large indexes.

I'll take a look at AllDocCollector that you mentioned in your mail hoping it will resolve
my problem.
 
Le Mercredi 12 Décembre 2012 13:30 CET, Carsten Schnober <schnober@ids-mannheim.de>
a écrit: 
 
> Am 07.12.2012 15:12, schrieb Bissan Audeh:
> 
> > I'm doing some experiments with Lucene where I run many queries and I keep top 1500
 results of each query. I recently switched to Lucene4.0, but in all cases I find that it
takes a lot of time to get the REAL document id using ScoreDoc and IndexSearcher especially
that I have very large indexes.
> > Does anyone know a faster way?
> > It would be more efficient to have the document real name as an attribute of the
class ScoreDoc in addition to its luceneID and its score, because in all cases this information
is always needed to show retrieved documents.
> 
> 
> By "real" name, do you mean something like the input document title as
> opposed to the id assigned by Lucene during indexing? I've resolved this
> by storing document name in a dedicated field so that I can use it in a
> query or filter.
> If you refer to the Lucene index ids, you might be interested in using a
> Collector; the example "AllDocCollector" given in the textbook "Lucene
> in Action" (McCandless, Hatcher, Gospodnetić, 2nd ed., ch. 6) is
> probably helpful.
> Best,
> Carsten
> 
> -- 
> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> Projekt KorAP                 | http://korap.ids-mannheim.de
> Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
> Korpusanalyseplattform der nächsten Generation
> Next Generation Corpus Analysis Platform
 
 
 
 


Mime
View raw message