From dev-return-321022-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Fri May 4 03:43:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id E1169180675 for ; Fri, 4 May 2018 03:43:04 +0200 (CEST) Received: (qmail 47333 invoked by uid 500); 4 May 2018 01:43:03 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 47320 invoked by uid 99); 4 May 2018 01:43:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 May 2018 01:43:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 14AB01807F2 for ; Fri, 4 May 2018 01:43:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 2K2ifaS2YygM for ; Fri, 4 May 2018 01:43:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 5DD9C5F59D for ; Fri, 4 May 2018 01:43:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B6A7BE0F22 for ; Fri, 4 May 2018 01:43:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6E3C42129C for ; Fri, 4 May 2018 01:43:00 +0000 (UTC) Date: Fri, 4 May 2018 01:43:00 +0000 (UTC) From: "Erick Erickson (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463273#comment-16463273 ] Erick Erickson commented on LUCENE-7976: ---------------------------------------- [~mikemccand] Do note that I'm seeing these errors on my latest full test run, I'll chase down what's up with them. That said, I think I'm at a point where I don't want to do another significant change to the approach if I can help it or unless there are issues.... I'll see if I can reproduce these errors reliably and whether I can get them to occur with the unmodified TMP code. Precommit passes though and compiles, can I ship it ;) [junit4] - org.apache.solr.core.TestCodecSupport.testMixedCompressionMode [junit4] - org.apache.solr.search.join.TestScoreJoinQPNoScore.testRandomJoin [junit4] - org.apache.solr.TestRandomFaceting.testRandomFaceting [junit4] - org.apache.solr.TestRandomDVFaceting.testRandomFaceting [junit4] - org.apache.solr.core.TestMergePolicyConfig.testTieredMergePolicyConfig [junit4] - org.apache.solr.TestJoin.testRandomJoin [junit4] - org.apache.solr.TestJoin.testJoin [junit4] - org.apache.solr.core.TestSolrDeletionPolicy1.testKeepOptimizedOnlyCommits > Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments > ------------------------------------------------------------------------------------------------- > > Key: LUCENE-7976 > URL: https://issues.apache.org/jira/browse/LUCENE-7976 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Erick Erickson > Assignee: Erick Erickson > Priority: Major > Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch > > > We're seeing situations "in the wild" where there are very large indexes (on disk) handled quite easily in a single Lucene index. This is particularly true as features like docValues move data into MMapDirectory space. The current TMP algorithm allows on the order of 50% deleted documents as per a dev list conversation with Mike McCandless (and his blog here: https://www.elastic.co/blog/lucenes-handling-of-deleted-documents). > Especially in the current era of very large indexes in aggregate, (think many TB) solutions like "you need to distribute your collection over more shards" become very costly. Additionally, the tempting "optimize" button exacerbates the issue since once you form, say, a 100G segment (by optimizing/forceMerging) it is not eligible for merging until 97.5G of the docs in it are deleted (current default 5G max segment size). > The proposal here would be to add a new parameter to TMP, something like (no, that's not serious name, suggestions welcome) which would default to 100 (or the same behavior we have now). > So if I set this parameter to, say, 20%, and the max segment size stays at 5G, the following would happen when segments were selected for merging: > > any segment with > 20% deleted documents would be merged or rewritten NO MATTER HOW LARGE. There are two cases, > >> the segment has < 5G "live" docs. In that case it would be merged with smaller segments to bring the resulting segment up to 5G. If no smaller segments exist, it would just be rewritten > >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). It would be rewritten into a single segment removing all deleted docs no matter how big it is to start. The 100G example above would be rewritten to an 80G segment for instance. > Of course this would lead to potentially much more I/O which is why the default would be the same behavior we see now. As it stands now, though, there's no way to recover from an optimize/forceMerge except to re-index from scratch. We routinely see 200G-300G Lucene indexes at this point "in the wild" with 10s of shards replicated 3 or more times. And that doesn't even include having these over HDFS. > Alternatives welcome! Something like the above seems minimally invasive. A new merge policy is certainly an alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org