lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Beady Geraghty" <beadygerag...@gmail.com>
Subject Re: out-of-memory when searching, paging does not work.
Date Sun, 04 Jun 2006 18:30:21 GMT
I finally got back to doing my project.  HitCollector solved my problem.
Thank you for all the help.


On 5/14/06, Beady Geraghty <beadygeraghty@gmail.com> wrote:
>
>  Thank you for the links.  I will go through them, and hopefully solve my
> problem.
>
>
>
> On 5/14/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> >
> >
> > please review the advice in these archived messages, I think you'll find
> > them very applicable to your problem...
> >
> >
> > http://www.nabble.com/eliminating-scoring-for-the-sake-of-efficiency-t1603827.html#a4351614
> >
> > http://www.nabble.com/Exact-date-search-doesn%27t-work-with-1.9.1--t1418643.html#a3833741
> >
> >
> >
> > : Date: Sun, 14 May 2006 15:34:08 -0400
> > : From: Beady Geraghty <beadygeraghty@gmail.com >
> > : Reply-To: java-user@lucene.apache.org
> > : To: java-user@lucene.apache.org
> > : Subject: Re: out-of-memory when searching, paging does not work.
> > :
> > : Here is the gist of the code:
> > :
> > :     Query query = new TermQuery( new Term("contents", q.toLowerCase
> > ()));
> > :
> > :
> > :     long start = new Date().getTime();
> > :     Hits hits = is.search(query);
> > :     long end = new Date().getTime();
> > :
> > :     System.err.println("Found " + hits.length() +
> > :       " document(s) (in " + (end - start) +
> > :       " milliseconds) that matched query '" +
> > :         q  + "'");
> > :
> > :
> > :     int ct = hits.length() ;
> > :     int ct2 = 400000;
> > :     int step = 10000;
> > :     int startct;
> > :     while (ct2 < ct ) {
> > :      startct = ct2;
> > :      for (int i = startct; i < startct+step; i++ ) {
> > :       if (ct2 >= ct ) {
> > :        break;
> > :       }
> > :       Document doc = hits.doc(ct2);
> > :       doc.get("filename");
> > :       ct2++;
> > :      }
> > :      System.out.println( "ct2 is " + ct2 );
> > :      ir.close();
> > :      is.close();
> > :      fsDir.close();
> > :      ir = null;
> > :      is = null;
> > :      fsDir = null;
> > :      fsDir = FSDirectory.getDirectory(indexDir, false);
> > :      ir = IndexReader.open (fsDir);
> > :      is = new IndexSearcher(ir);
> > :      hits = is.search(query);
> > :
> > :
> > :     }
> > :
> > : if  ct2 is set to 40,000 as oppose to 400,000 , I see some output
> > before I
> > : get the out-of-memory.  If not, I get out of memory error almost
> > instantly
> > : without any output.
> > :
> > : Is there a method call to clear the cache ?
> > :
> > : Thank you for your response.
> > :
> > :
> > : On 5/14/06, Erik Hatcher <erik@ehatchersolutions.com > wrote:
> > : >
> > : > Could you share at least some pseudo-code of what you're doing in
> > the
> > : > loop of retrieving the "name" of each document?   Are you storing
> > all
> > : > of those names as you iterate?
> > : >
> > : > Have you profiled your application to see exactly where the memory
> > is
> > : > going?  It is surely being eaten by your own code and not Lucene.
> > : >
> > : >        Erik
> > : >
> > : >
> > : > On May 14, 2006, at 12:07 PM, Beady Geraghty wrote:
> > : >
> > : > > I have an out-of-memroy error when returning  many hits.
> > : > >
> > : > > I am still on Lucene 1.4.3
> > : > >
> > : > > I have a simple term query.  It returned 899810 documents.
> > : > > I try to retrieve the name of each document and nothing else
> > : > > and I ran out of memory.
> > : > >
> > : > > Instead of getting the names all at once, I tried to query again
> > after
> > : > > every 10,000 document.
> > : > > I close the index reader, index searcher, and the fsDir and
> > re-query
> > : > > for every 10000 documents.  This still doesn't work.
> > : > >
> > : > >> From another entry in the forum, it appears that the information
> > : > >> about
> > : > > the hits that I have skipped over are still kept even though I
> > don't
> > : > > access them.  Am I understanding it correctly that if I start
> > : > > accessing
> > : > > from the 400000th documents onwards, some information about the
> > : > > 0-399999
> > : > > documents are still cached even though I have skipped over those.
> > : > > Is there a way to get the file name (and perhaps other
> > information)
> > : > > of the
> > : > > remaining
> > : > > documents ?
> > : > >
> > : > > (I tried a different term query that returned a hit size of
> > 400000,
> > : > > and I
> > : > > was able
> > : > > to get the names of them all without re-quering)
> > : > >
> > : > > I think that I see someone mentioned about  clearing the hit cache
> > ,
> > : > > though I don't how this is done.
> > : > >
> > : > > Thank you in advance for any hints on dealing with this.
> > : >
> > : >
> > : >
> > ---------------------------------------------------------------------
> > : > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > : > For additional commands, e-mail: java-user-help@lucene.apache.org
> > : >
> > : >
> > :
> >
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message