lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danil ŢORIN <torin...@gmail.com>
Subject Re: date issues
Date Thu, 23 Feb 2012 08:39:15 GMT
Ranges on String are painfully slow.

Format them as YYYYMMDD and store as class="solr.TrieIntField"
precisionStep="8" omitNorms="true" positionIncrementGap="0"

On Thu, Feb 23, 2012 at 10:19, findbestopensource
<findbestopensource@gmail.com> wrote:
> Yes. By storing as String, You should be able to do range search. I am not
> sure, which is better, storing as String / Integer.
>
>  Regards
>  Aditya
>  www.findbestopensource.com
>
>
> On Thu, Feb 23, 2012 at 1:25 PM, Jason Toy <jasontoy@gmail.com> wrote:
>
>> Can I still do range searches on a string? It seems like it would be more
>> efficient to store as an integer.
>> > Hi,
>> >
>> > You could consider storing date field as String in "YYYYMMDD" format.
>> This
>> > will save space and it will perform better.
>> >
>> > Regards
>> > Aditya
>> > www.findbestopensource.com
>> >
>> >
>> > On Thu, Feb 23, 2012 at 11:55 AM, Jason Toy <jasontoy@gmail.com> wrote:
>> >
>> >> I  have a solr instance with about 400m docs. For text searches it is
>> >> perfectly fine. When I do searches that calculate  the amount of times
a
>> >> word appeared in the doc set for every day of a month, it usually causes
>> >> solr to crash with out of memory errors.
>> >> I calculate this by running  ~30 queries, one for each day to see the
>> >> count for that day.
>> >> Is there a better way I could do this?
>> >>
>> >> Currently the date fields are stored as:
>> >> <fieldType name="date" class="solr.TrieDateField" omitNorms="true"
>> >> precisionStep="0" positionIncrementGap="0"/>
>> >>
>> >> and the timestamps are stored in the format of:
>> >> 2012-02-22T21:11:14Z
>> >>
>> >> We have no need to store anything beyond the date. Will just changing
>> the
>> >> time portion to zeros make things faster:
>> >> 2012-02-22T00:00:00Z
>> >>
>> >> I thought that to optimize this, there would be an actual date type that
>> >> doesnt store the time component, but looking through the solr docs, I
>> don't
>> >> see anything specifically for a date as opposed to a timestamp.  Would
>> it
>> >> be faster for me to store dates in an sint format?  What is the optimal
>> >> format I should use? If the format is to continue to use TrieDateField,
>>  is
>> >> it not a waste to store the hour/minute/seconds even if they are not
>> being
>> >> used?
>> >>
>> >> Is there anything else I can do to make this more efficient?
>> >>
>> >> I have looked around on the mailing list and on google and not sure what
>> >> to use, I would appreciate any pointers.  Thanks.
>> >>
>> >> Jason
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message