lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Beady Geraghty" <beadygerag...@gmail.com>
Subject Re: out-of-memory when searching, paging does not work.
Date Sun, 14 May 2006 23:15:19 GMT
Thank you for the links.  I will go through them, and hopefully solve my
problem.



On 5/14/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
>
>
> please review the advice in these archived messages, I think you'll find
> them very applicable to your problem...
>
>
> http://www.nabble.com/eliminating-scoring-for-the-sake-of-efficiency-t1603827.html#a4351614
>
> http://www.nabble.com/Exact-date-search-doesn%27t-work-with-1.9.1--t1418643.html#a3833741
>
>
>
> : Date: Sun, 14 May 2006 15:34:08 -0400
> : From: Beady Geraghty <beadygeraghty@gmail.com>
> : Reply-To: java-user@lucene.apache.org
> : To: java-user@lucene.apache.org
> : Subject: Re: out-of-memory when searching, paging does not work.
> :
> : Here is the gist of the code:
> :
> :     Query query = new TermQuery( new Term("contents", q.toLowerCase()));
> :
> :
> :     long start = new Date().getTime();
> :     Hits hits = is.search(query);
> :     long end = new Date().getTime();
> :
> :     System.err.println("Found " + hits.length() +
> :       " document(s) (in " + (end - start) +
> :       " milliseconds) that matched query '" +
> :         q  + "'");
> :
> :
> :     int ct = hits.length() ;
> :     int ct2 = 400000;
> :     int step = 10000;
> :     int startct;
> :     while (ct2 < ct ) {
> :      startct = ct2;
> :      for (int i = startct; i < startct+step; i++ ) {
> :       if (ct2 >= ct ) {
> :        break;
> :       }
> :       Document doc = hits.doc(ct2);
> :       doc.get("filename");
> :       ct2++;
> :      }
> :      System.out.println( "ct2 is " + ct2 );
> :      ir.close();
> :      is.close();
> :      fsDir.close();
> :      ir = null;
> :      is = null;
> :      fsDir = null;
> :      fsDir = FSDirectory.getDirectory(indexDir, false);
> :      ir = IndexReader.open(fsDir);
> :      is = new IndexSearcher(ir);
> :      hits = is.search(query);
> :
> :
> :     }
> :
> : if  ct2 is set to 40,000 as oppose to 400,000 , I see some output before
> I
> : get the out-of-memory.  If not, I get out of memory error almost
> instantly
> : without any output.
> :
> : Is there a method call to clear the cache ?
> :
> : Thank you for your response.
> :
> :
> : On 5/14/06, Erik Hatcher <erik@ehatchersolutions.com> wrote:
> : >
> : > Could you share at least some pseudo-code of what you're doing in the
> : > loop of retrieving the "name" of each document?   Are you storing all
> : > of those names as you iterate?
> : >
> : > Have you profiled your application to see exactly where the memory is
> : > going?  It is surely being eaten by your own code and not Lucene.
> : >
> : >        Erik
> : >
> : >
> : > On May 14, 2006, at 12:07 PM, Beady Geraghty wrote:
> : >
> : > > I have an out-of-memroy error when returning  many hits.
> : > >
> : > > I am still on Lucene 1.4.3
> : > >
> : > > I have a simple term query.  It returned 899810 documents.
> : > > I try to retrieve the name of each document and nothing else
> : > > and I ran out of memory.
> : > >
> : > > Instead of getting the names all at once, I tried to query again
> after
> : > > every 10,000 document.
> : > > I close the index reader, index searcher, and the fsDir and re-query
> : > > for every 10000 documents.  This still doesn't work.
> : > >
> : > >> From another entry in the forum, it appears that the information
> : > >> about
> : > > the hits that I have skipped over are still kept even though I don't
> : > > access them.  Am I understanding it correctly that if I start
> : > > accessing
> : > > from the 400000th documents onwards, some information about the
> : > > 0-399999
> : > > documents are still cached even though I have skipped over those.
> : > > Is there a way to get the file name (and perhaps other information)
> : > > of the
> : > > remaining
> : > > documents ?
> : > >
> : > > (I tried a different term query that returned a hit size of 400000,
> : > > and I
> : > > was able
> : > > to get the names of them all without re-quering)
> : > >
> : > > I think that I see someone mentioned about  clearing the hit cache ,
> : > > though I don't how this is done.
> : > >
> : > > Thank you in advance for any hints on dealing with this.
> : >
> : >
> : > ---------------------------------------------------------------------
> : > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : > For additional commands, e-mail: java-user-help@lucene.apache.org
> : >
> : >
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message