lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solrcloud Index corruption
Date Tue, 10 Mar 2015 16:21:20 GMT
Ahhh, ok. When you reloaded the cores, did you do it core-by-core?
I can see how something could get dropped in that case.

However, if you used the Collections API and two cores mysteriously
failed to reload that would be a bug. Assuming the replicas in question
were up and running at the time you reloaded.

Thanks for letting us know what's going on.
Erick

On Tue, Mar 10, 2015 at 4:34 AM, Martin de Vries
<martin@downnotifier.com> wrote:
> Hi,
>
>> this _sounds_ like you somehow don't have indexed="true" set for the
>> field in question.
>
>
> We investigated a lot more. The CheckIndex tool didn't find any error. We
> now think the following happened:
> - We changed the schema two months ago: we changed a field to
> indexed="true". We reloaded the cores, but two of them doesn't seem to be
> reloaded (maybe we forgot).
> - We reindexed all content. The new field worked fine.
> - We think the leader changed to a server that didn't reload the core
> - After that we field stopped working for new indexed documents
>
> Thanks for your help.
>
>
> Martin
>
>
>
>
> Erick Erickson schreef op 06.03.2015 17:02:
>
>> bq: You say in our case some docs didn't made it to the node, but
>> that's not really true: the docs can be found on the corrupted nodes
>> when I search on ID. The docs are also complete. The problem is that
>> the docs do not appear when I filter on certain fields
>>
>> this _sounds_ like you somehow don't have indexed="true" set for the
>> field in question. But it also sounds like you're saying that search
>> on that field works on some nodes but not on others, I'm assuming
>> you're adding "&distrib=false" to verify this. It shouldn't be
>> possible to have different schema.xml files on the different nodes,
>> but you might try checking through the admin UI.
>>
>> Network burps shouldn't be related here. If the content is stored,
>> then the info made it to Solr intact, so this issue shouldn't be
>> related to that.
>>
>> Sounds like it may just be the bugs Mark is referencing, sorry I don't
>> have the JIRA numbers right off.
>>
>> Best,
>> Erick
>>
>> On Thu, Mar 5, 2015 at 4:46 PM, Shawn Heisey <apache@elyograg.org> wrote:
>>
>>> On 3/5/2015 3:13 PM, Martin de Vries wrote:
>>>
>>>> I understand there is not a "master" in SolrCloud. In our case we use
>>>> haproxy as a load balancer for every request. So when indexing every
>>>> document will be sent to a different solr server, immediately after
>>>> each other. Maybe SolrCloud is not able to handle that correctly?
>>>
>>> SolrCloud can handle that correctly, but currently sending index
>>> updates to a core that is not the leader of the shard will incur a
>>> significant performance hit, compared to always sending updates to the
>>> correct core. A small performance penalty would be understandable,
>>> because the request must be redirected, but what actually happens is a
>>> much larger penalty than anyone expected. We have an issue in Jira to
>>> investigate that performance issue and make it work as efficiently as
>>> possible. Indexing batches of documents is recommended, not sending one
>>> document per update request. General performance problems with Solr
>>> itself can lead to extremely odd and unpredictable behavior from
>>> SolrCloud. Most often these kinds of performance problems are related
>>> in some way to memory, either the java heap or available memory in the
>>> system. http://wiki.apache.org/solr/SolrPerformanceProblems [1] Thanks,
>>> Shawn
>
>
>
>
> Links:
> ------
> [1] http://wiki.apache.org/solr/SolrPerformanceProblems

Mime
View raw message