lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Potter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-7820) IndexFetcher should delete the current index directory before downloading the new index when isFullCopyNeeded==true
Date Fri, 24 Jul 2015 15:16:06 GMT

    [ https://issues.apache.org/jira/browse/SOLR-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640575#comment-14640575
] 

Timothy Potter commented on SOLR-7820:
--------------------------------------

Thanks for the feedback ... this actually came up in a production installation I worked on
... they had 1.4TB of indexes (oversharded on a node) and that node went down. When it came
back, Solr decided all shards had to be fully copied over because they were too far out-of-date
with the leader. The node could never recover because they didn't have another 1.4TB of SSD
allocated on that node. Granted this is an extreme case. The interesting thing here is that
node wasn't offline for very long, so I was surprised to see it need a full copy.

Part of this is bad design in that they shouldn't have oversharded the nodes as much given
their space limitations.

I'm wondering if we can compute the necessary space needed for an incoming full-index for
a shard and if that isn't available, then don't do it. Of course that's harder to do when
oversharding. But to me that's better than running the disk out of space just to keep failing
to recover.

I also want to put some more energy into trying to avoid a full copy because in my case, the
node that went down wasn't out of sync with the leader by more than a couple thousand docs
per shard, so the fact that Solr wanted to do a full copy of 1.4TB of indexes because a few
thousand docs were missing sounds like the real culprit in my case.

> IndexFetcher should delete the current index directory before downloading the new index
when isFullCopyNeeded==true
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7820
>                 URL: https://issues.apache.org/jira/browse/SOLR-7820
>             Project: Solr
>          Issue Type: Improvement
>          Components: replication (java)
>            Reporter: Timothy Potter
>
> When a replica is trying to recover and it's IndexFetcher decides it needs to pull the
full index from a peer (isFullCopyNeeded == true), then the existing index directory should
be deleted before the full copy is started to free up disk to pull a fresh index, otherwise
the server will potentially need 2x the disk space (old + incoming new). Currently, the IndexFetcher
removes the index directory after the new is downloaded; however, once the fetcher decides
a full copy is needed, what is the value of the existing index? It's clearly out-of-date and
should not serve queries. Since we're deleting data preemptively, maybe this should be an
advanced configuration property, only to be used by those that are disk-space constrained
(which I'm seeing more and more with people deploying high-end SSDs - they typically don't
have 2x the disk capacity required by an index).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message