lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3954) Option to have updateHandler and DIH skip updateLog
Date Tue, 16 Oct 2012 23:23:04 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477462#comment-13477462
] 

Shawn Heisey commented on SOLR-3954:
------------------------------------

bq. In any case, I don't think we would add an option to skip the update log - you can remove
it if the performance is unacceptable.

When I revamp my SolrJ application, I plan to use soft commit on a very short interval (maybe
10 seconds) but only do a hard commit every five minutes, possibly even less often.

If I understand the updateLog functionality right, and I don't claim that I do, it would mean
that my SolrJ code would not need to keep separate track of which updates succeeded with soft
commit and which ones succeeded with hard commit.  If the server went down four minutes and
55 seconds after the last hard commit, I would have reasonable expectation that when it came
back up, all those soft commits would get properly applied to my index.

Assuming I have a proper understanding above, I want the updateLog for my incremental updates.
 It makes the bulk import take at least twice as long, and I do not need it there because
if that fails, I will just start it over.  If I am going to benefit from updateLog, I need
to be able to turn it off for bulk indexing.

Is there a way to create a second updateHandler that does not have updateLog enabled and tell
DIH to use that handler?

                
> Option to have updateHandler and DIH skip updateLog
> ---------------------------------------------------
>
>                 Key: SOLR-3954
>                 URL: https://issues.apache.org/jira/browse/SOLR-3954
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 4.0
>            Reporter: Shawn Heisey
>             Fix For: 4.1
>
>
> The updateLog feature makes updates take longer, likely because of the I/O time required
to write the additional information to disk.  It may take as much as three times as long for
the indexing portion of the process.  I'm not sure whether it affects the time to commit,
but I would imagine that the difference there is small or zero.  When doing incremental updates/deletes
on an existing index, the time lag is probably very small and unimportant.
> When doing a full reindex (which may happen via DIH), especially if this is done in a
build core that is then swapped with a live core, this performance hit is unacceptable.  It
seems to make the import take about three times as long.
> An option to have an update skip the updateLog would be very useful for these situations.
 It should have a method in SolrJ and be exposed in DIH as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message