lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samarendra Pratap <samarz...@gmail.com>
Subject Re: Right memory for search application
Date Wed, 28 Apr 2010 06:24:34 GMT
I have got a lot of valuable information in this thread so far.
Thanks to all.

In my last mail I mentioned only two fields because others' usage was
negligible and I thought they are not important. But now after *Toke *explained
the formulae, I think sorting on those fields would also be consuming a huge
part of memory.There are 2 other sorting fields; one of which is used in
both ascending/descending sorting.

Within next couple of days (or may be a week) I'll be

1. profiling my application,
2. analyzing and tuning GC options



However, I have a few more curiosities -

1. Tom wrote:

*Have you checked that your machine is correctly identified as a server*
*and has optimized GC settings?*

*I did not understand the meaning of "correctly identified as a server" Can
you please help me understand?*
*
*
2. *Should I change the type of fields?*
** As I said in my first mail that I have 56 fields in my index, most of
them contain a numeric value or one of system defined values (e.g. gender
field can contain only "male", "female", or "unknown"). There are only 7
fields which are indexed with user defined values.
All the fields are created with *Field*
(String name, String value, Field.Store store, Field.Index index)
It would be creating all the fields as normal string fields. Is it
*always*a good idea to use specific classes (NumericField, DateTime
etc.). We do not
have space problem if that matters.

3. *Is there any advice on number of fields?*
*Somewhere on the net I read that instead of keeping different type of
values in different fields, (e.g. field1:value1, field2:value2,...) one
should practice keeping different values in single field (e.g.
field:field1_value1,
field:field2_value2,...). But I could not confirm it from anywhere else. Any
comments?*

4. Ian wrote:

*Sorting by score down to the second will use a lot of memory.  Can you*
*make it less granular?*

Is it less painful sorting on two fields; first on yymmdd and then on
yymmddHHMMSS than sorting just on latter? (Naturally it should use second
field, only where required but technically ...?)


Thanks again for the invaluable support I am getting from here.

- Samar

On Wed, Apr 28, 2010 at 9:12 AM, Lance Norskog <goksron@gmail.com> wrote:

> Solr's timestamp representation (TrieDateField) is tuned for space and
> speed. It has a compressed representation, and sorts with far less
> space than Strings.
>
> Also you get something called a date facet, which lets you bucketize
> facet searches by time block.
>
> On Tue, Apr 27, 2010 at 1:02 PM, Toke Eskildsen <te@statsbiblioteket.dk>
> wrote:
> > Samarendra Pratap [samarzone@gmail.com] wrote:
> >> 1. Our default option is sort by score, however almost 8% of searches
> use
> >> sorting on a field (yyyymmddHHMMSS). This field is indexed as string
> (not as
> >> NumericField or DateField).
> >
> > Guessing that the timestamp is practically unique for each document,
> sorting by String takes up a bit more than
> > 18M * (40 bytes + 2 * "yyyymmddHHMMSS".length() bytes) ~= 1.2 GB of RAM
> as the Strings are cached. Coupled with the normal overhead of just opening
> an index of your size (500MB by your measurements?), I would have guessed
> that 3600MB would definitely be enough to open the index and do sorted
> searches.
> >
> > I realize that fiddling with production servers is dangerous, but
> connecting with JConsole and forcing a garbage collection might be
> acceptable? That should enable you to determine whether you're leaking
> memory or if it's just the JVM being greedy. I'd guess you leaking though,
> as HotSpot does not normally allocate up to the limit if it does not need
> to.
> >
> > Anyway, changing to one of the optimized fields for sorting dates should
> shave 1 GB off the memory requirement, so I'll recommend doing that no
> matter what the main cause of your memory problems is.
> >
> > Regards,
> > Toke Eskildsen
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Regards,
Samar

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message