lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Denis Shishlyannikoc (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-6385) Strange behavior on indexing document with wrong date format
Date Sun, 24 Aug 2014 06:11:11 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108281#comment-14108281
] 

Denis Shishlyannikoc commented on SOLR-6385:
--------------------------------------------

 Erick Erickson, can you be more specific when talking about other JIRAs? 
Thanks.

> Strange behavior on indexing document with wrong date format
> ------------------------------------------------------------
>
>                 Key: SOLR-6385
>                 URL: https://issues.apache.org/jira/browse/SOLR-6385
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 4.7.2
>         Environment: Solr server in Windows 7, solrj
>            Reporter: Denis Shishlyannikoc
>            Priority: Critical
>
> Hello.
> I try to work with solr lately and did not get much experience with it yet, so part of
problems that I will describe here can be due to lack of knowledge.
> Excuse me for that.
> Problems that I saw:
> 1) I use solj to index collection of SolrInputDocuments.
> To do it I call method add(Collection) of CloudSolrServer object.
> Just for fun I tried to index one of documents with not correct date:
> I took solr valid date value of one of these SolrInputDocuments and changed the "T" symbol
in it to "K".
> (this date is defined in schema.xml as 
> <field name="mydate" type="tdate" indexed="true" stored="true" multiValued="false"
/>	)
> Solr failed to index collection and returned SolrServerException.
> Also what happened above is that part of documents of this SolrInputDocuments collection
got indexed correctly, problematic date document failed to be indexed together with several
valid (from all points of view) SolrInputDocuments of this collection.
> Looks like solr went through documents in collection, indexing them one by one, trowed
exception on problematic date document and finally did not index all valid documents that
were after problematic date document.
> 2) After failure, described in 1), solr kept problematic date document in some queue
and tried to reindex this document again (attempt per some 3-5 minutes, did not measure exact
time of that), showing same (failed to parse date) exception in logs! After solr server restart
issue is gone: no more tries to reindex problematic date document.
> Questions to be answered
> 1) What is the default behavior of solr on indexing problematic values fields? 
> For example for date field: I expect solr to index null date (instead of not indexing
of whole document) and then write some warning to logs and return some indication of problem
on UpdateResponse. 
> Maybe solr behavior on not valid field values should be configurable (defined in some
xml element in schema).
> 2) While indexing collection of documents, should solr index all valid documents (and
not return on first problem as it happens now) ?
> If I index collection of documents, I expect solr to index all valid (from all points
of view) documents and return indexing status on UpdateResponse about all not indexed problematic
documents.
> 3) Why solr tries to reindex problematic document? Looks like bug that can create useless
load on server.
> If this behavior is planned by design, then how can I force solr to stop reindexing such
problem documents (without restarting of solr server)?
> Where can I read about it?
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message