lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Beady Geraghty" <beadygerag...@gmail.com>
Subject Re: out-of-memory when searching, paging does not work.
Date Sun, 14 May 2006 19:34:08 GMT
Here is the gist of the code:

    Query query = new TermQuery( new Term("contents", q.toLowerCase()));


    long start = new Date().getTime();
    Hits hits = is.search(query);
    long end = new Date().getTime();

    System.err.println("Found " + hits.length() +
      " document(s) (in " + (end - start) +
      " milliseconds) that matched query '" +
        q  + "'");


    int ct = hits.length() ;
    int ct2 = 400000;
    int step = 10000;
    int startct;
    while (ct2 < ct ) {
     startct = ct2;
     for (int i = startct; i < startct+step; i++ ) {
      if (ct2 >= ct ) {
       break;
      }
      Document doc = hits.doc(ct2);
      doc.get("filename");
      ct2++;
     }
     System.out.println( "ct2 is " + ct2 );
     ir.close();
     is.close();
     fsDir.close();
     ir = null;
     is = null;
     fsDir = null;
     fsDir = FSDirectory.getDirectory(indexDir, false);
     ir = IndexReader.open(fsDir);
     is = new IndexSearcher(ir);
     hits = is.search(query);


    }

if  ct2 is set to 40,000 as oppose to 400,000 , I see some output before I
get the out-of-memory.  If not, I get out of memory error almost instantly
without any output.

Is there a method call to clear the cache ?

Thank you for your response.


On 5/14/06, Erik Hatcher <erik@ehatchersolutions.com> wrote:
>
> Could you share at least some pseudo-code of what you're doing in the
> loop of retrieving the "name" of each document?   Are you storing all
> of those names as you iterate?
>
> Have you profiled your application to see exactly where the memory is
> going?  It is surely being eaten by your own code and not Lucene.
>
>        Erik
>
>
> On May 14, 2006, at 12:07 PM, Beady Geraghty wrote:
>
> > I have an out-of-memroy error when returning  many hits.
> >
> > I am still on Lucene 1.4.3
> >
> > I have a simple term query.  It returned 899810 documents.
> > I try to retrieve the name of each document and nothing else
> > and I ran out of memory.
> >
> > Instead of getting the names all at once, I tried to query again after
> > every 10,000 document.
> > I close the index reader, index searcher, and the fsDir and re-query
> > for every 10000 documents.  This still doesn't work.
> >
> >> From another entry in the forum, it appears that the information
> >> about
> > the hits that I have skipped over are still kept even though I don't
> > access them.  Am I understanding it correctly that if I start
> > accessing
> > from the 400000th documents onwards, some information about the
> > 0-399999
> > documents are still cached even though I have skipped over those.
> > Is there a way to get the file name (and perhaps other information)
> > of the
> > remaining
> > documents ?
> >
> > (I tried a different term query that returned a hit size of 400000,
> > and I
> > was able
> > to get the names of them all without re-quering)
> >
> > I think that I see someone mentioned about  clearing the hit cache ,
> > though I don't how this is done.
> >
> > Thank you in advance for any hints on dealing with this.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message