Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DA849200D35 for ; Tue, 7 Nov 2017 17:08:54 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id D93ED160BED; Tue, 7 Nov 2017 16:08:54 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 029201609C8 for ; Tue, 7 Nov 2017 17:08:53 +0100 (CET) Received: (qmail 73653 invoked by uid 500); 7 Nov 2017 16:08:52 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 73641 invoked by uid 99); 7 Nov 2017 16:08:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Nov 2017 16:08:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 0D8A1ECB9C for ; Tue, 7 Nov 2017 16:08:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.129 X-Spam-Level: ** X-Spam-Status: No, score=2.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id HPTUTXuKdf4D for ; Tue, 7 Nov 2017 16:08:49 +0000 (UTC) Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 9C2885FCE6 for ; Tue, 7 Nov 2017 16:08:48 +0000 (UTC) Received: by mail-wm0-f41.google.com with SMTP id s66so4764523wmf.5 for ; Tue, 07 Nov 2017 08:08:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=moT7FkolypQiTp4mm46dR7y9vjeJnP/69OPPgiDhzGg=; b=IEPmZG7TgdvwCzgGEh4zTQW/wTKy+3rafD5oeFoeM2nwYsLRMdwfBmGS44rVzX8NFA Wem/ThUEFkuHLg9WbfbyZZhc8+/dr/WldfEN/1cg6qsKTVnITC7bnyvxcCKfEVWpipsw xvLVtaEO3kFDFz2aB+h0amxzAP+OXGEUGPCTCWFj/ezSHjCse7cG49I/c8wm61p4k8rZ fFrmxdLI8ATOzkQWIlZKgk94mwgMd6gBIPFlZEicp5f7scWguJ2VZISwKPFFkXT9MMnc YX8kKtOTGsq/P3w4e5XFnk6nmbkIa3h5SjlJjz8Vu/RVBCdOUsKNqzTpKSDQyRQgDTvg DZCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=moT7FkolypQiTp4mm46dR7y9vjeJnP/69OPPgiDhzGg=; b=H0/yc3K/Qsu8NIFP5P1rgZoiGs78GCEELXVEcIGgzn2DYYVUtes0/O1A59gQ57pmnU hKXtzkPCXS1Nre4Z5yiFYidfpe0rE/okvDuVnxa5XzFOleIE1nybSQVPwZ20zj9mWbxn 5zpcfGZXdaXJB0y1UfVkg49eqTUdE3txtSPBPmMStgRZ2COTD0SA4u4N/5rkjf1S1oPy 6K9RIm2rewoNqcCCV5jvuuvrpY4vys1HMMt96Xg5BA71oCEy5r6Us0WkmqlIf8slU0vH VSsjFnGAYqPvcN3qdWUPf+KS2QgzzHRCCG+GUFsa2E5p0ubXYoncKMYPAOB1bYG9EHdO lExg== X-Gm-Message-State: AMCzsaWcxtFzm+7GCoGPADTdgW6QZ6jExHH6xSEQqiJnNBdPndWcRqJ0 cPGmGYobiB/9xQyqEZU8ZyOPyem5IMHZeQ7peyXRtA== X-Google-Smtp-Source: ABhQp+R6fFoMt5TqZ90AZS+W4AdKc9dikviAi4OepvEBQ9IEI3CUPcukY1rbiI/VX0C8mjkzKglt1VOjrIm7FeWEIKM= X-Received: by 10.80.221.140 with SMTP id w12mr26132277edk.180.1510070927990; Tue, 07 Nov 2017 08:08:47 -0800 (PST) MIME-Version: 1.0 Received: by 10.80.202.2 with HTTP; Tue, 7 Nov 2017 08:08:47 -0800 (PST) In-Reply-To: References: From: Amrit Sarkar Date: Tue, 7 Nov 2017 21:38:47 +0530 Message-ID: Subject: Re: Long blocking during indexing + deleteByQuery To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary="f403043e5db83fa6ab055d66c9f5" archived-at: Tue, 07 Nov 2017 16:08:55 -0000 --f403043e5db83fa6ab055d66c9f5 Content-Type: text/plain; charset="UTF-8" Maybe not a relevant fact on this, but: "addAndDelete" is triggered by "*Reordering of DBQs'; *that means there are non-executed DBQs present in the updateLog and an add operation is also received. Solr makes sure DBQs are executed first and than add operation is executed. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Tue, Nov 7, 2017 at 9:19 PM, Erick Erickson wrote: > Well, consider what happens here. > > Solr gets a DBQ that includes document 132 and 10,000,000 other docs > Solr gets an add for document 132 > > The DBQ takes time to execute. If it was processing the requests in > parallel would 132 be in the index after the delete was over? It would > depend on when the DBQ found the doc relative to the add. > With this sequence one would expect 132 to be in the index at the end. > > And it's worse when it comes to distributed indexes. If the updates > were sent out in parallel you could end up in situations where one > replica contained 132 and another didn't depending on the vagaries of > thread execution. > > Now I didn't write the DBQ code, but that's what I think is happening. > > Best, > Erick > > On Tue, Nov 7, 2017 at 7:40 AM, Chris Troullis > wrote: > > As an update, I have confirmed that it doesn't seem to have anything to > do > > with child documents, or standard deletes, just deleteByQuery. If I do a > > deleteByQuery on any collection while also adding/updating in separate > > threads I am experiencing this blocking behavior on the non-leader > replica. > > > > Has anyone else experienced this/have any thoughts on what to try? > > > > On Sun, Nov 5, 2017 at 2:20 PM, Chris Troullis > wrote: > > > >> Hi, > >> > >> I am experiencing an issue where threads are blocking for an extremely > >> long time when I am indexing while deleteByQuery is also running. > >> > >> Setup info: > >> -Solr Cloud 6.6.0 > >> -Simple 2 Node, 1 Shard, 2 replica setup > >> -~12 million docs in the collection in question > >> -Nodes have 64 GB RAM, 8 CPUs, spinning disks > >> -Soft commit interval 10 seconds, Hard commit (open searcher false) 60 > >> seconds > >> -Default merge policy settings (Which I think is 10/10). > >> > >> We have a query heavy index heavyish use case. Indexing is constantly > >> running throughout the day and can be bursty. The indexing process > handles > >> both updates and deletes, can spin up to 15 simultaneous threads, and > sends > >> to solr in batches of 3000 (seems to be the optimal number per trial and > >> error). > >> > >> I can build the entire collection from scratch using this method in < 40 > >> mins and indexing is in general super fast (averages about 3 seconds to > >> send a batch of 3000 docs to solr). The issue I am seeing is when some > >> threads are adding/updating documents while other threads are issuing > >> deletes (using deleteByQuery), solr seems to get into a state of extreme > >> blocking on the replica, which results in some threads taking 30+ > minutes > >> just to send 1 batch of 3000 docs. This collection does use child > documents > >> (hence the delete by query _root_), not sure if that makes a > difference, I > >> am trying to duplicate on a non-child doc collection. CPU/IO wait seems > >> minimal on both nodes, so not sure what is causing the blocking. > >> > >> Here is part of the stack trace on one of the blocked threads on the > >> replica: > >> > >> qtp592179046-576 (576) > >> java.lang.Object@608fe9b5 > >> org.apache.solr.update.DirectUpdateHandler2.addAndDelete( > >> DirectUpdateHandler2.java:354) > >> org.apache.solr.update.DirectUpdateHandler2.addDoc0( > >> DirectUpdateHandler2.java:237) > >> org.apache.solr.update.DirectUpdateHandler2.addDoc( > >> DirectUpdateHandler2.java:194) > >> org.apache.solr.update.processor.RunUpdateProcessor.processAdd( > >> RunUpdateProcessorFactory.java:67) > >> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd( > >> UpdateRequestProcessor.java:55) > >> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd( > >> DistributedUpdateProcessor.java:979) > >> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd( > >> DistributedUpdateProcessor.java:1192) > >> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd( > >> DistributedUpdateProcessor.java:748) > >> org.apache.solr.handler.loader.JavabinLoader$1.update > >> (JavabinLoader.java:98) > >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1. > >> readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180) > >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1. > >> readIterator(JavaBinUpdateRequestCodec.java:136) > >> org.apache.solr.common.util.JavaBinCodec.readObject( > >> JavaBinCodec.java:306) > >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251) > >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1. > >> readNamedList(JavaBinUpdateRequestCodec.java:122) > >> org.apache.solr.common.util.JavaBinCodec.readObject( > >> JavaBinCodec.java:271) > >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251) > >> org.apache.solr.common.util.JavaBinCodec.unmarshal( > JavaBinCodec.java:173) > >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec. > unmarshal( > >> JavaBinUpdateRequestCodec.java:187) > >> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs( > >> JavabinLoader.java:108) > >> org.apache.solr.handler.loader.JavabinLoader.load( > JavabinLoader.java:55) > >> org.apache.solr.handler.UpdateRequestHandler$1.load( > >> UpdateRequestHandler.java:97) > >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody( > >> ContentStreamHandlerBase.java:68) > >> org.apache.solr.handler.RequestHandlerBase.handleRequest( > >> RequestHandlerBase.java:173) > >> org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) > >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723) > >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529) > >> > >> A cursory search lead me to this JIRA https://issues.apache. > >> org/jira/browse/SOLR-7836, not sure if related though. > >> > >> Can anyone shed some light on this issue? We don't do deletes very > >> frequently, but it is bringing solr to it's knees when we do, which is > >> causing some big problems. > >> > >> Thanks, > >> > >> Chris > >> > --f403043e5db83fa6ab055d66c9f5--