Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7DC8E939F for ; Fri, 4 Nov 2011 17:32:33 +0000 (UTC) Received: (qmail 23691 invoked by uid 500); 4 Nov 2011 17:32:30 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 23553 invoked by uid 500); 4 Nov 2011 17:32:30 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 23545 invoked by uid 99); 4 Nov 2011 17:32:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Nov 2011 17:32:30 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jason.rutherglen@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Nov 2011 17:32:25 +0000 Received: by vws7 with SMTP id 7so3298182vws.35 for ; Fri, 04 Nov 2011 10:32:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=bTcie2BCPgqJh5YI6IFsG85CXFZ4VoKKa3VmEKEq3UQ=; b=twKGY7E+jDpbKisnRWhN5UHWzeK3YkMi/abJGgqqf2woAn5bRkEZ38V08EsopbsxV5 UyHGz9g5iTtKUjv+mCHKazw2PHoWQkObjCCZsy9ryOtd6o1gFl9nuCo3YutN2lp/g1Lt ByHQ2WpLR4faajVXnIdEm0iRRzwdVYKKGY4Iw= MIME-Version: 1.0 Received: by 10.52.36.112 with SMTP id p16mr15883165vdj.102.1320427924293; Fri, 04 Nov 2011 10:32:04 -0700 (PDT) Received: by 10.52.166.169 with HTTP; Fri, 4 Nov 2011 10:32:04 -0700 (PDT) In-Reply-To: <8C865897-9AA9-4945-ABEB-65F0F907326E@transpac.com> References: <8C865897-9AA9-4945-ABEB-65F0F907326E@transpac.com> Date: Fri, 4 Nov 2011 13:32:04 -0400 Message-ID: Subject: Re: overwrite=false support with SolrJ client From: Jason Rutherglen To: solr-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 It should be supported in SolrJ, I'm surprised it's been lopped out. Bulk indexing is extremely common. On Fri, Nov 4, 2011 at 1:16 PM, Ken Krugler wrote: > Hi list, > > I'm working on improving the performance of the Solr scheme for Cascading. > > This supports generating a Solr index as the output of a Hadoop job. We use SolrJ to write the index locally (via EmbeddedSolrServer). > > There are mentions of using overwrite=false with the CSV request handler, as a way of improving performance. > > I see that https://issues.apache.org/jira/browse/SOLR-653 removed this support from SolrJ, because it was deemed too dangerous for mere mortals. > > My question is whether anyone knows just how much performance boost this really provides. > > For Hadoop-based workflows, it's straightforward to ensure that the unique key field is really unique, thus if the performance gain is significant, I might look into figuring out some way (with a trigger lock) of re-enabling this support in SolrJ. > > Thanks, > > -- Ken > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Mahout & Solr > > > > >