Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1C3452BF2 for ; Thu, 21 Apr 2011 23:48:20 +0000 (UTC) Received: (qmail 97052 invoked by uid 500); 21 Apr 2011 23:48:19 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 96978 invoked by uid 500); 21 Apr 2011 23:48:19 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 96971 invoked by uid 99); 21 Apr 2011 23:48:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Apr 2011 23:48:18 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yseeley@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Apr 2011 23:48:13 +0000 Received: by fxm7 with SMTP id 7so242899fxm.35 for ; Thu, 21 Apr 2011 16:47:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:reply-to:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type:content-transfer-encoding; bh=h6rAxCzkRTYxxfQSyoYmp183UIkwecmfBljRuP9FPgY=; b=PmWgq7wV6G6J1Qvsgdn7I9qE8oNDmNe3Qy96ENSRO2PoKt2Y7oGJ+qKWKrdKUCq1E/ 4E2oAfzI/hAfvljv8XzKYRhVJWqz+NU3/EAkscOsLIDz3JUn9vy8kDrXOXVaXE+Vu2VS Yw890xbtv6RSIAEycpiEce/aZSSzCxwuIVBE8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:reply-to:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=uCVscqtjhexFcUAchkB/ydv5aOISoU+A5XtVCr5uAtwxd5G01fZx+k/VidV0XvvejH r5PHqVAzJqc7NW00bBFHKJgHvPqreTiFAuITnLR5Dc/TBN9+bz+2hOxAwW0aR8RWqWoh J+aaMwHZe2gQgLLQovW612awWJVortXXbMuV0= MIME-Version: 1.0 Received: by 10.223.144.144 with SMTP id z16mr500677fau.24.1303429671765; Thu, 21 Apr 2011 16:47:51 -0700 (PDT) Sender: yseeley@gmail.com Reply-To: yonik@lucidimagination.com Received: by 10.223.104.76 with HTTP; Thu, 21 Apr 2011 16:47:51 -0700 (PDT) In-Reply-To: References: Date: Thu, 21 Apr 2011 19:47:51 -0400 X-Google-Sender-Auth: -WyQpCoVOxbkNdBZj6lcHD4l__Q Message-ID: Subject: Re: Stand-alone Index updating using EmbeddedSolrServer From: Yonik Seeley To: dev@lucene.apache.org, kiko@alum.mit.edu Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Thu, Apr 21, 2011 at 7:27 PM, Kiko Aumond wrote: > Yes, I've seen that page, but I went a bit beyond the material there, as = the > code I wrote is able to set parameters such as separators, encapsulators = and > the index columns,=A0 whether to split parameters, auto-commit as well as= the > ability to do incremental or full index reloads. Is this a CSV loader? If so, did you know the CSV loader (and other data loaders) have the option to bypass HTTP also and stream directly from a local file (or other URL)? > Also, from what I've seen in DirectSolrConnection (version 1.4.1), you ha= ve > to supply the document body as a String.=A0 We want to avoid havindgto lo= ad > the entire document into memory, which is why we load the files into > ContentStream objects and pass them to the embedded Solr server (I am > assuming=A0 ContentStream actually streams the file as its name suggests > instead of trying to load it into memory).=A0 The utility I wrote gets a = path, > a Regex expression for all the files to be loaded, as well as the paramet= ers > mentioned above and it does either a full or incremental upload of multip= le > files with a single command. > > We run a very high load application with SOLR in the back end that requir= es > that we use the Embedded solr server to eliminate the network round-trip. > Even a small incremental gain in performance is important for us. Eliminating the network round-trip is certainly important for good bulk indexing performance. Luckily you don't have to embed to do that. You can use multiple threads (say 16 for a 4 core server) that essentially covers up any round-trip latency (use persistent connections though! or use SolrJ which does by default), or you can use the StreamingUpdateSolrServer that eliminates round-trip network delays by streaming documents over multiple already open connections. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org