Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9773271B6 for ; Tue, 4 Oct 2011 13:18:23 +0000 (UTC) Received: (qmail 73582 invoked by uid 500); 4 Oct 2011 13:18:20 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 73530 invoked by uid 500); 4 Oct 2011 13:18:20 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 73521 invoked by uid 99); 4 Oct 2011 13:18:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2011 13:18:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of pranny@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2011 13:18:15 +0000 Received: by wwe32 with SMTP id 32so484285wwe.5 for ; Tue, 04 Oct 2011 06:17:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; bh=mQrTyKtd6lzJXJLMXLTL73UuCflV+VJy+5mGGfoWu7I=; b=j11Kb9DMm+rB0lriokC2suriFT0KeWmsxFJ+oqdcXtw3KONqXGjPCJWoJjNyZg6cME dLFY3gU4Kwd0rp3QVn3Lw6k/X0f4KCHH5bYEf2RUrX4BsZg2HYsvwICPXcHVT0E4f291 FG0C74ihIvDQVzwMYeLauafnAtZpUGb5wvFuM= Received: by 10.216.195.22 with SMTP id o22mr1486518wen.2.1317734274100; Tue, 04 Oct 2011 06:17:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.186.211 with HTTP; Tue, 4 Oct 2011 06:17:34 -0700 (PDT) From: Pranav Prakash Date: Tue, 4 Oct 2011 18:47:34 +0530 Message-ID: Subject: How to achieve Indexing @ 270GiB/hr To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00504502cd8550ed9504ae78eb16 --00504502cd8550ed9504ae78eb16 Content-Type: text/plain; charset=ISO-8859-1 Greetings, While going through the article 265% indexing speedup with Lucene's concurrent flushing I was stunned by the endless possibilities in which Indexing speed could be increased. I'd like to take inputs from everyone over here as to how to achieve this speed. As far as I understand there are two broad ways of feeding data to Solr - 1. Using DataImportHandler 2. Using HTTP to POST docs to Solr. The speeds at which the article describes indexing seems kinda too much to expect using the second approach. Or is it possible using multiple instances feeding docs to Solr? My current setup does the following - 1. Execute SQL queries to create database of documents that needs to be fed. 2. Go through the columns one by one, and create XMLs for them and send it over to Solr in batches of max 500 docs. Even if using DataImportHandler what are the ways this could be optimized? If I am able to solve the problem of indexing data in our current setup, my life would become a lot easier. *Pranav Prakash* "temet nosce" Twitter | Blog | Google --00504502cd8550ed9504ae78eb16--