Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7E370200D51 for ; Thu, 23 Nov 2017 05:51:14 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 73212160C0F; Thu, 23 Nov 2017 04:51:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B8F57160BFD for ; Thu, 23 Nov 2017 05:51:13 +0100 (CET) Received: (qmail 67528 invoked by uid 500); 23 Nov 2017 04:51:11 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 67516 invoked by uid 99); 23 Nov 2017 04:51:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Nov 2017 04:51:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 222D5C2A54 for ; Thu, 23 Nov 2017 04:51:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.088 X-Spam-Level: X-Spam-Status: No, score=0.088 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KB_WAM_FROM_NAME_SINGLEWORD=0.2, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=elyograg.org Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 872PwCBKjLbu for ; Thu, 23 Nov 2017 04:51:08 +0000 (UTC) Received: from frodo.elyograg.org (frodo.elyograg.org [166.70.79.219]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 2E9465F522 for ; Thu, 23 Nov 2017 04:51:08 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by frodo.elyograg.org (Postfix) with ESMTP id DA8151D3C for ; Wed, 22 Nov 2017 21:50:59 -0700 (MST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=elyograg.org; h= content-language:content-transfer-encoding:content-type :content-type:in-reply-to:mime-version:user-agent:date:date :message-id:from:from:references:subject:subject:received :received; s=mail; t=1511412657; bh=/yvzIMiUFPittTNAG+mfyC+E1X5E 56vq0Cne/ifN+GU=; b=cKMIHwyZtw2nfTYHNt1P+BRLJO5ODYu04GaQpcwl0kXM gbExh00mq/7kfOkXiHZCUE+hDElTon1QHNzpuMuDNwwlj4MV3nCpdYP391fs6gm0 mskP1XysQYquVFO3tlR/rmA+RFhRTTQlgQYceW8z52JsBHhFNrmLNaWSpFP/FoY= X-Virus-Scanned: Debian amavisd-new at frodo.elyograg.org Received: from frodo.elyograg.org ([127.0.0.1]) by localhost (frodo.elyograg.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Z7w85NDzlLhg for ; Wed, 22 Nov 2017 21:50:57 -0700 (MST) Received: from [192.168.1.111] (111.int.elyograg.org [192.168.1.111]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: elyograg@elyograg.org) by frodo.elyograg.org (Postfix) with ESMTPSA id AF266AAD for ; Wed, 22 Nov 2017 21:50:57 -0700 (MST) Subject: Re: Merging of index in Solr To: solr-user@lucene.apache.org References: From: Shawn Heisey Message-ID: <9e375b99-3643-a825-bfae-6719790c1a1c@elyograg.org> Date: Wed, 22 Nov 2017 21:50:56 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US archived-at: Thu, 23 Nov 2017 04:51:14 -0000 On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote: > I'm doing the merging on the SSD drive, the speed should be ok? The speed of virtually all modern disks will have almost no influence on the speed of the merge.  The bottleneck isn't disk transfer speed, it's the operation of the merge code in Lucene. As I said earlier in this thread, a merge is **NOT** just a copy. Lucene must completely rebuild the data structures of the index to incorporate all of the segments of the source indexes into a single segment in the target index, while simultaneously *excluding* information from documents that have been deleted. The best speed I have ever personally seen for a merge is 30 megabytes per second.  This is far below the sustained transfer rate of a typical modern SATA disk.  SSD is capable of far faster data transfer ...but it will NOT make merges go any faster. > We need to merge because the data are indexed in two different collections, > and we need them to be under the same collection, so that we can do things > like faceting more accurately. > Will sharding alone achieve this? Or do we have to merge first before we do > the sharding? If you want the final index to be sharded, it's typically best to index from scratch into a new empty collection that has the number of shards you want.  The merging tool you're using isn't aware of concepts like shards.  It combines everything into a single index. It's not entirely clear what you're asking with the question about sharding alone.  Making a guess:  I have never heard of facet accuracy being affected by whether or not the index is sharded.  If that *is* possible, then I would expect an index that is NOT sharded to have better accuracy. Thanks, Shawn