lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Toy <jason...@gmail.com>
Subject date issues
Date Thu, 23 Feb 2012 06:25:00 GMT
I  have a solr instance with about 400m docs. For text searches it is perfectly fine. When
I do searches that calculate  the amount of times a word appeared in the doc set for every
day of a month, it usually causes solr to crash with out of memory errors. 
I calculate this by running  ~30 queries, one for each day to see the count for that day.
Is there a better way I could do this?

Currently the date fields are stored as:
<fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0"/>

and the timestamps are stored in the format of:
2012-02-22T21:11:14Z

We have no need to store anything beyond the date. Will just changing the time portion to
zeros make things faster:
2012-02-22T00:00:00Z

I thought that to optimize this, there would be an actual date type that doesnt store the
time component, but looking through the solr docs, I don't see anything specifically for a
date as opposed to a timestamp.  Would it be faster for me to store dates in an sint format?
 What is the optimal format I should use? If the format is to continue to use TrieDateField,
 is it not a waste to store the hour/minute/seconds even if they are not being used?

Is there anything else I can do to make this more efficient?

I have looked around on the mailing list and on google and not sure what to use, I would appreciate
any pointers.  Thanks.

Jason
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message