lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Possible improvement: TrieDate without time of day
Date Wed, 19 Dec 2012 13:46:30 GMT
"Have you tried to set your dates' hours, minutes, seconds and milliseconds 
to 0 before indexing them ?"

If only it were that easy!

And maybe that's the point - we need an attribute on "date/DateField" fields 
to express those semantics - throw away the time of day when indexing values 
for this field/type. Maybe an attribute such as indexTime="false".

Also, I am wondering if I use day dates and they are in a range like 1990 to 
2012, that's a relatively small number of unique values, like 8,000.

And, also enable a source format that has only the day date so that the 
source text can be more compact.

And maybe support other date/day formats as well, including RFC format. 
SolrCell has some support for RFC date format, I think.

But, the real point of this thread was whether it matters or not if time of 
day is suppressed.

Although your comment seemed to imply that the new 4.1 postings format would 
store day-style dates more efficiently - could you summarize what effects we 
could see?

-- Jack Krupansky

-----Original Message----- 
From: Adrien Grand
Sent: Wednesday, December 19, 2012 5:05 AM
To: dev@lucene.apache.org
Subject: Re: Possible improvement: TrieDate without time of day

Hi Jack,

On Sat, Dec 15, 2012 at 4:36 PM, Jack Krupansky <jack@basetechnology.com> 
wrote:
> I have seen a few inquiries concerned with the overhead of storing time of
> day for simple dates. The concerns are both storage and performance. So, 
> the
> question/proposal is whether a variant of TrieDate with no time of day
> component, call it TrieDay or TrieDateTimeless or TrieDateNoTime (or
> incompatibly rename TrieDate to TrieDateTime and use TrieDate for the new
> format), could be stored with, say, 40% more storage efficiency and maybe 
> a
> comparable or at least significant performance improvement for queries.

Storing only the day in a 32-bits integer could save space, but I'm
not sure Solr should provide a type for all granularities of dates?
Have you tried to set your dates' hours, minutes, seconds and
milliseconds to 0 before indexing them ? This should help postings
lists share terms and improve storage efficiency (especially with the
new Lucene41PostingsFormat).

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message