Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B16AA10F95 for ; Sat, 6 Jul 2013 12:28:00 +0000 (UTC) Received: (qmail 26901 invoked by uid 500); 6 Jul 2013 12:27:57 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 26653 invoked by uid 500); 6 Jul 2013 12:27:56 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 26645 invoked by uid 99); 6 Jul 2013 12:27:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Jul 2013 12:27:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.220.175 as permitted sender) Received: from [209.85.220.175] (HELO mail-vc0-f175.google.com) (209.85.220.175) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Jul 2013 12:27:51 +0000 Received: by mail-vc0-f175.google.com with SMTP id hr11so2259548vcb.34 for ; Sat, 06 Jul 2013 05:27:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=rIsPLwo/U8Nc9U8uuxqau3V5JNBNBM3cGyDt0AjnBtc=; b=N2HEmlCGdXX1NP28/qf/c7fFssT8zMbZMLzGEH/Wz9AqFxQ3acT83OtP7RmWfdG2Yn 8IVNXNxKLBZ38odGpHoaO2tM0RbO1IPlsNM7zoUgnezmWzLOs2daaP8HyC+sNTGB4hn7 LbtznT6A3t3klbGYyZvxPXx0IiA3BuGlFBYMa0hYYbzBjmsmK3WjYfHHwthgmtaFoO1M ZUrV8ca87ribjTv7+0IrGME2UbmNgx6IZSlO35DKc3VSOqEjNsOMnOuMWDEaNp/9RaxT Cu93Kitmt5KbGAvFqnHZUnyLIZKw16eut66KwMGRQsHjm5BZQq15SJjIUzWtmHBXPg0H XtsQ== MIME-Version: 1.0 X-Received: by 10.58.146.196 with SMTP id te4mr9985169veb.62.1373113651274; Sat, 06 Jul 2013 05:27:31 -0700 (PDT) Received: by 10.52.179.101 with HTTP; Sat, 6 Jul 2013 05:27:31 -0700 (PDT) In-Reply-To: References: <69E73D70-FCF2-4C59-9269-4B19538A6F9F@innoventsolutions.com> Date: Sat, 6 Jul 2013 08:27:31 -0400 Message-ID: Subject: Re: 2.1billion+ document From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b66fd3b6baaf804e0d6efbf X-Virus-Checked: Checked by ClamAV on apache.org --047d7b66fd3b6baaf804e0d6efbf Content-Type: text/plain; charset=ISO-8859-1 uniqueKey is used to enforce there being only a single copy of a doc. Say a doc changes and you re-index it. If there is a doc in the index already _with the same uniqueKey_ it'll be deleted and the new one will be the only one visible. Which implies that if you do implement the suggestions, be sure you send any docs you update to the _same_ shard you sent them to originally. If you have no occasion to update docs that already exist in your index, you don't care about this much. Best Erick On Sat, Jul 6, 2013 at 12:53 AM, Gora Mohanty wrote: > On 6 July 2013 09:45, Ali, Saqib wrote: > > Thanks Jason! That was very helpful. > > > > I read on the solr wiki that: > > "Documents must have a unique key and the unique key must be stored > > (stored="true" in schema.xml)" > > > > What is this unique key? Is this just a id that we define in the > schema.xml > > that is unique to all documents? We have something as follows: > > > > > > Will this suffice? > > By default, schema.xml should also have > id > and with these, you should be all set as > far as the configuration goes. > > At index time, you also have to provide > this unique key to Solr, and for distributed > search, ensure that it is unique across all > shards, as the Wiki notes. How you form > this unique key depends on your use case, > but for example, you could use the system > filepath, or a MD5 sum of the file contents. > > Regards, > Gora > --047d7b66fd3b6baaf804e0d6efbf--