lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Johnson <brianmjohn...@yahoo.com>
Subject Re: SOLR-470 & default value in schema with NOW (update)
Date Thu, 24 Apr 2008 16:37:37 GMT
Ok, I thought the quickest thing to try out would be (B) so now all of my feeds have the same
format and I have removed the default value "NOW" from my schema.xml file.

    <field name="timestamp_created" type="date" indexed="true" stored="true" required="true"
multiValued="false" />
... and ...
    <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>

I rebuilt my index with consistent date formats in all my files but my exception remains unchanged.

Apr 24, 2008 9:11:26 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable date: "2008-04-24T09:03:33Z"
        at org.apache.solr.schema.DateField.toObject(DateField.java:173)
        at org.apache.solr.schema.DateField.toObject(DateField.java:83)
        at org.apache.solr.update.DocumentBuilder.loadStoredFields(DocumentBuilder.java:285)
...
Caused by: java.text.ParseException: Unparseable date: "2008-04-24T09:03:33Z"
        at java.text.DateFormat.parse(Unknown Source)
        at org.apache.solr.schema.DateField.toObject(DateField.java:170)
        ... 27 more

Here's the output from Luke for the only date in my schema... timestamp_created
Field: timestamp_createdField Type: date
Properties:  Indexed, Stored, Omit Norms, Sort Missing Last
Schema:  Indexed, Stored, Omit Norms, Sort Missing Last
Index:  Indexed, Stored, Omit Norms
Index Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer 
Query Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer 
Docs:  146704
Distinct:  33


termfrequency
2008-04-24T09:03:53Z11076
2008-04-24T09:03:55Z10036
2008-04-24T09:03:52Z9400
2008-04-24T09:03:51Z8855
2008-04-24T09:03:54Z8763
2008-04-24T09:03:33Z6577
2008-04-24T09:03:34Z5783
2008-04-24T09:03:36Z5665
2008-04-24T09:03:35Z5507
2008-04-24T09:03:38Z5498
2008-04-24T09:03:39Z5496
2008-04-24T09:03:37Z5407
2008-04-24T09:04:02Z4509
2008-04-24T09:03:46Z4366
2008-04-24T09:04:07Z4317
2008-04-24T09:04:19Z4131
2008-04-24T09:04:17Z4021
2008-04-24T09:04:05Z4020
2008-04-24T09:04:15Z3898
2008-04-24T09:04:18Z3894
2008-04-24T09:04:16Z3497
2008-04-24T09:04:06Z3482
2008-04-24T09:04:08Z3442
2008-04-24T09:04:01Z3196
2008-04-24T09:04:03Z3182
2008-04-24T09:04:04Z3179
2008-04-24T09:03:45Z2936
2008-04-24T09:03:32Z2412
2008-04-24T09:03:56Z1870
2008-04-24T09:04:00Z433
2008-04-24T09:04:14Z430
2008-04-24T09:04:20Z369
2008-04-24T09:03:40Z353

... and for the field ...
Field Type: dateFields: timestamp_created  <-- only date filed in the schema
Tokenized:  false
Class Name:  org.apache.solr.schema.DateField
Index Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer 
Query Analyzer: org.apache.solr.schema.FieldType$DefaultAnalyzer 

This now seems to be something different than SOLR-470 and SOLR-544 since the format seems
to be accepted at indexing, and is consistent in the index, but is still not accepted at query
time.

Anyone have a suggestion?

Thanks,

Brian Johnson

----- Original Message ----
From: Brian Johnson <brianmjohnson@yahoo.com>
To: solr-user@lucene.apache.org
Sent: Wednesday, April 23, 2008 11:23:54 AM
Subject: SOLR-470 & default value in schema with NOW

So I just ran into this bug:
    https://issues.apache.org/jira/browse/SOLR-470

and read about this related one:
    https://issues.apache.org/jira/browse/SOLR-544

Here is the relevant trace:

Apr 22, 2008 10:59:01 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable date: "2008-04-03T22:42:13Z"
        at org.apache.solr.schema.DateField.toObject(DateField.java:173)
        at org.apache.solr.schema.DateField.toObject(DateField.java:83)
        at org.apache.solr.update.DocumentBuilder.loadStoredFields(DocumentBuilder.java:285)
...
Caused by: java.text.ParseException: Unparseable date: "2008-04-03T22:42:1
        at java.text.DateFormat.parse(Unknown Source)

The root cause (I believe, am going to confirm tonight) is that I have multiple index files
I'm uploading into this column in the schema:
   <field name="timestamp_created" type="date" indexed="true" stored="true" required="true"
multiValued="false" default="NOW" />

Here is my typedef for 'date':
    <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>


What I came to realize is that my index files contain this column value consistently specified,
but one of my files does not contain the column at all. Due to my indication of a default
value, I am reliant on the SOLR default for NOW being in the same format (no millis, .0, .00,
.000, etc) as I have passed in my feed. As you can see from the exception, my feed does not
contain any millis which is a valid format according to 544 and the documentation I've read.


Now finally, my problem. The format for NOW doesn't seem to be documented so I have no idea
what I need to 'match' (or even that matching is necessary from the documentation outside
these 2 bugs) in order to take advantage of the default value feature and mix that with data
from my streams. I can see from here that it isn't the 'no millis' form since a discrepancy
is triggering this bug. 

Solutions?

A) Should I create a format normalizer and configure that into my typedef for 'date' so that
I am agnostic of these differences in terms of input and insure the indexed format is consistent?
I believe this would be a <analyzer type="index"><filter .../></analyzer>.
I'm not concerned about the presence or absence of millis on the output. Would this approach
work? Based on the presence of the filter in the fieldType, it feels like a hack.

B) Should I remove the default value and just insure all my streams have this value specified
consistently an not trigger the bug? It seems to me that SOLR should be robust in this respect,
but reading SOLR-544 I can see that this isn't an opinion that is held by all.

C) Should I apply one of the existing SOLR-470 patch files and move on?

D) Should I take a stab at https://issues.apache.org/jira/browse/SOLR-440 as an alternative
'class' for my 'date' type?

Thanks,

Brian







Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message