lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Z <zjavie...@yahoo.com>
Subject Re: Running OutOfMemory while optimizing and searching
Date Thu, 16 Sep 2004 17:19:42 GMT
Hi
 
We are trying to get the memory footprint on our searchers.
 
We have indexes of around 1 million docs and around 25 searchable fields.
We noticed that without any searches performed on the indexes, on startup, the memory taken
up by the searcher is roughly 7 times the .tii file size. The .tii file is read into memory
as per the code. Our .tii files are around 8-10 MB in size and our startup memory foot print
is around 60-70 MB.
 
Then when we start doing our searches, the memory goes up, depending on the fields we search
on. We are noticing that if we start searching on new fields, the memory kind of goes up.

 
Doug, 
 
Your calculation below on what is taken up by the searcher, does it take into account the
.tii file being read into memory  or am I not making any sense ? 
 
1 byte * Number of searchable fields in your index * Number of docs in 
your index
plus
1k bytes * number of terms in query
plus
1k bytes * number of phrase terms in query


Thank you
ZJ

Doug Cutting <cutting@apache.org> wrote:
> What do your queries look like? The memory required
> for a query can be computed by the following equation:
>
> 1 Byte * Number of fields in your query * Number of
> docs in your index
>
> So if your query searches on all 50 fields of your 3.5
> Million document index then each search would take
> about 175MB. If your 3-4 searches run concurrently
> then that's about 525MB to 700MB chewed up at once.

That's not quite right. If you use the same IndexSearcher (or 
IndexReader) for all of the searches, then only 175MB are used. The 
arrays in question (the norms) are read-only and can be shared by all 
searches.

In general, the amount of memory required is:

1 byte * Number of searchable fields in your index * Number of docs in 
your index

plus

1k bytes * number of terms in query

plus

1k bytes * number of phrase terms in query

The latter are for i/o buffers. There are a few other things, but these 
are the major ones.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message