lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksander M. Stensby" <>
Subject Question regarding sorting and memory consumption in lucene
Date Fri, 10 Oct 2008 12:09:36 GMT
Hello, I've read a lot of threads now on memory consumption and sorting,  
and I think I have a pretty good understanding of how things work, but I  
could still need some input here..

We currently have a system consisting of 6 different lucene indexes (all  
have the same structure, so you could say it is a form of sharding). We  
currently use this approach because we want to be able to give users  
access to different index (but not necessarily  all indexes) etc.

(We are planning to move to a solr-based system, but for now we would like  
to solve this issue with our current lucene-based system.)

The thing is, the indexes are rather big (ranging from 5G to 20G per index  
and 10 - 30 million entries per index.)
We keep one searcher object open per index, and when the index is changed  
(new documents added in batches several times a day), we update the  
searcher objects.
In the warmup procedure we did a couple of searches and things work fine,  
BUT i realized that in our application we return hits sorted by date by  
default, and our warmup procedure did non-sorted queries... so still the  
first searches done by the user after an update was slow (obviously).

To cope, I changed the warmup procedure to include a sorted search, and  
now the user will not notice slow queries. Good!
But, the problem at hand is that we are running into memory problems (and  
I understand that sorting does consume a lot of memory...) But is there  
any way that is "best practice" to deal with this? The field we sort on is  
an un_indexed text field representing the date. typically "2008-10-10". I  
am aware that string field sorting consumes a lot of memory, so should we  
change this field to something different? Would this help us with the  
memory problems?

As a sidenote / couriosity question: Does it matter if we use the search  
method returning Hits versus the search method returning TopFieldDocs? (we  
are not iterating them in any way when this memory issue occurs)

Thanks in advance for any guidance we may get.

Best regards,
  Aleksander M. Stensby

Aleksander M. Stensby
Senior Software Developer
Integrasco A/S

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message