lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3366) Restart of Solr during data import causes an empty index to be generated on restart
Date Tue, 17 Apr 2012 18:59:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255827#comment-13255827
] 

James Dyer commented on SOLR-3366:
----------------------------------

I don't see how this would be related to DIH.  Even if you had "clean=true", it doesn't commit
the deletes until the entire update is complete.  So, like you say, we should expect to only
lose the changes from the current import, not the entire index.

I wonder if this is a side-effect from using replication.  Sometimes, replication copies an
entire new index to the slaves in a new directory, then writes this new directory to "index.properties".
 On restart solr looks for "index.properties" to find the appropriate index directory.  If
this file had been touched or removed, possibly it restarted and didn't find the correct directory,
then created a new index?  Of course, this would have affected the slaves only.

I vaguely remember there being a bug some releases back where index corruption could occur
if the system is ungracefully shut down, and I see you're on 3.4.  But then again, maybe my
memory is failing me because I didn't see this in the release notes.
                
> Restart of Solr during data import causes an empty index to be generated on restart
> -----------------------------------------------------------------------------------
>
>                 Key: SOLR-3366
>                 URL: https://issues.apache.org/jira/browse/SOLR-3366
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler, replication (java)
>    Affects Versions: 3.4
>            Reporter: Kevin Osborn
>
> We use the DataImportHandler and Java replication in a fairly simple setup of a single
master and 4 slaves. We had an operating index of about 16,000 documents. The DataImportHandler
is pulled periodically by an external service using the "command=full-import&clean=false"
command for a delta import.
> While processing one of these commands, we did a deployment which required us to restart
the application server (Tomcat 7). So, the import was interrupted. Prior to this deployment,
the full index of 16,000 documents had been replicated to all slaves and was working correctly.
> Upon restart, the master restarted with an empty index and then this empty index was
replicated across all slaves. So, our search index was now empty.
> My expected behavior was to lose any changes in the delta import (basically prior to
the commit). However, I was not expecting to lose all data. Perhaps this is due to the fact
that I am using the full-import method, even though it is really a delta, for performance
reasons? Or does the data import just put the index in some sort of invalid state?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message