lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bissan AUDEH" <au...@emse.fr>
Subject Re: A fast way to get real docID from large indexes?
Date Wed, 12 Dec 2012 22:45:48 GMT
 Thanx David, I'll give that a try for sure because this "time" issue is driving me crazy.
it is useless to be very fast in searching the index if you need a lot of time to present
what you've found in a meaningful way! 

Le Mercredi 12 Décembre 2012 23:19 CET, "Smiley, David W." <dsmiley@mitre.org> a écrit:

 
> I suggest you load your unique key field into memory via the FieldCache,
> then reference it that way.  See LUCENE-4541 for a "ValueSourceAccessor"
> proposal.  There are FieldCache based ValueSources.
> 
> ~ David Smiley
> 
> On 12/12/12 5:00 PM, "Bissan AUDEH" <audeh@emse.fr> wrote:
> 
> > Thank you Carsten,
> >What I mean by document real name is any stored field in the index that
> >represents the document (ex:Document title, document file name in the
> >file system, document location,...), or anything that you stored as a
> >field at index time and you which to present to the user as  search
> >result, because presenting the LuceneDocID means nothing to the user.
> >
> >What I'm doing actually is something like this :
> >
> >IndexSearcher searcher;
> >TopDocs results =  searcher.search(query, numTotalHits);
> >ScoreDoc[] hits = results.scoreDocs;
> >for (int  i = 0; i < numTotalHits; i++)
> >{
> >   doc = searcher.doc(hits[i].doc);
> >   System.out.println( hits[i].doc + " : " + hits[i].score);
> >}
> >
> >unless I'm doing it wrong, the instruction "searcher.doc(hits[i].doc);"
> >seems to be time consuming for large indexes.
> >
> >I'll take a look at AllDocCollector that you mentioned in your mail
> >hoping it will resolve my problem.
> > 
> >Le Mercredi 12 Décembre 2012 13:30 CET, Carsten Schnober
> ><schnober@ids-mannheim.de> a écrit:
> > 
> >> Am 07.12.2012 15:12, schrieb Bissan Audeh:
> >> 
> >> > I'm doing some experiments with Lucene where I run many queries and I
> >>keep top 1500  results of each query. I recently switched to Lucene4.0,
> >>but in all cases I find that it takes a lot of time to get the REAL
> >>document id using ScoreDoc and IndexSearcher especially that I have very
> >>large indexes.
> >> > Does anyone know a faster way?
> >> > It would be more efficient to have the document real name as an
> >>attribute of the class ScoreDoc in addition to its luceneID and its
> >>score, because in all cases this information is always needed to show
> >>retrieved documents.
> >> 
> >> 
> >> By "real" name, do you mean something like the input document title as
> >> opposed to the id assigned by Lucene during indexing? I've resolved this
> >> by storing document name in a dedicated field so that I can use it in a
> >> query or filter.
> >> If you refer to the Lucene index ids, you might be interested in using a
> >> Collector; the example "AllDocCollector" given in the textbook "Lucene
> >> in Action" (McCandless, Hatcher, Gospodnetić, 2nd ed., ch. 6) is
> >> probably helpful.
> >> Best,
> >> Carsten
> >> 
> >> -- 
> >> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> >> Projekt KorAP                 | http://korap.ids-mannheim.de
> >> Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
> >> Korpusanalyseplattform der nächsten Generation
> >> Next Generation Corpus Analysis Platform
> > 
> > 
> > 
> > 
> >
> 
 
 
 
 


Mime
View raw message