lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: How to get document effectively. or FieldCache example
Date Fri, 21 Apr 2017 12:02:47 GMT
Hi,

for full text search, Lucene is the right tool. The problem is that inverted indexes and the
software (like Lucene) on top are optimized to return the best ranking results very fast.
This is what users normally do, e.g. when they search Google. You get a page with 10 or 20
results displayed. This can be done very fast, so Lucene will quickly collect those 20 documents
and retrieving the values from stored fields is cheap.

The problem is if you want to get all results! This is in most cases also not really what
you want: The lower-ranking results coming at the end are in most cases not interesting, so
you won't fetch them from the index. Retrieving the first 10 or 20 is fast.

FYI, try it out on Google: You can page the results but at some point it will not allow you
to dig deeper in the result (it is impossible to show results after offset 200 / page 20).
This is similar in Lucene. Fetching all results is discouraged, as it gets slower and slower
the deeper you dive.

Lucene has some workarounds like "searchAfter", but this does not solve the problem that fetching
the values is slow.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: neeraj shah [mailto:neerajshah84@gmail.com]
> Sent: Friday, April 21, 2017 1:22 PM
> To: java-user@lucene.apache.org
> Subject: Re: How to get document effectively. or FieldCache example
> 
> then which one is right tool for text searching in files. please can you
> suggest me?
> 
> 
> On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <jpountz@gmail.com> wrote:
> 
> > Lucene is not designed for retrieving that many results. What are you doing
> > with those 5 lacs documents, I suspect this is too much to display so you
> > probably perform some computations on them? If so maybe you could
> move them
> > to Lucene using eg. facets? If that does not work, I'm afraid that Lucene
> > is not the right tool for your problem.
> >
> > Le ven. 21 avr. 2017 à 08:56, neeraj shah <neerajshah84@gmail.com> a
> > écrit :
> >
> > > Yes I fetching around 5 lacs result from index searcher.
> > > Also i am indexing each line of each file because while searching i need
> > > all the lines of a file which has matched term.
> > > Please tell me am i doing it right.
> > > {code}
> > >
> > > InputStream  is = new BufferedInputStream(new FileInputStream(file));
> > >     BufferedReader bufr = new BufferedReader(new
> InputStreamReader(is));
> > >     String inputLine="" ;
> > >
> > >     while((inputLine=bufr.readLine())!=null ){
> > > Document doc = new Document();
> > >     doc.add(new
> > >
> > > Field("contents",inputLine,Field.Store.YES,Field.Index.
> > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
> > >     doc.add(new
> > > Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > >     String newRem = new String(rem);
> > >
> > >     doc.add(new
> > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
> > >     doc.add(new Field("fieldsort2",rem.toLowerCase().replaceAll("-",
> > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
> > >
> > >     doc.add(new
> > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > >     doc.add(new
> > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > >     doc.add(new
> > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > >
> > >     writer.addDocument(doc);
> > >
> > > }
> > >     is.close();
> > >
> > > {/code}
> > >
> > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <jpountz@gmail.com>
> wrote:
> > >
> > > > IndexSearcher.doc is the right way to retrieve documents. If this is
> > > > slowing things down for you, I'm wondering that you might be fetching
> > too
> > > > many results?
> > > >
> > > > Le jeu. 20 avr. 2017 à 14:16, neeraj shah <neerajshah84@gmail.com>
a
> > > > écrit :
> > > >
> > > > > Hello Everyone,
> > > > >
> > > > > I am using Lucene 3.6. I have to index around 60k docuemnts. After
> > > > > performing the search when i try to reterive documents from seacher
> > > using
> > > > > searcher.doc(docid)  it slows down the search .
> > > > > Please is there any other way to get the document.
> > > > >
> > > > > Also if anyone can give me an end-to-end example for working
> > > FieldCache.
> > > > > While implementing the cache i have :
> > > > >
> > > > > int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader, "id");
> > > > >
> > > > > now i dont know how to further use the fieldIds for improving search.
> > > > > Please give me an end-to-end example.
> > > > >
> > > > > Thanks
> > > > > Neeraj
> > > > >
> > > >
> > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message