Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 12427 invoked from network); 17 Sep 2007 03:02:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Sep 2007 03:02:09 -0000 Received: (qmail 54032 invoked by uid 500); 17 Sep 2007 03:01:55 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 53999 invoked by uid 500); 17 Sep 2007 03:01:55 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 53990 invoked by uid 99); 17 Sep 2007 03:01:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Sep 2007 20:01:55 -0700 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Sep 2007 03:03:43 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1IX6rc-0003vb-1D for solr-user@lucene.apache.org; Sun, 16 Sep 2007 20:01:32 -0700 Message-ID: <12728679.post@talk.nabble.com> Date: Sun, 16 Sep 2007 20:01:32 -0700 (PDT) From: erolagnab To: solr-user@lucene.apache.org Subject: Indexing Speed MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: trung.n.k@gmail.com X-Virus-Checked: Checked by ClamAV on apache.org Hi, Just a FYI. I've seen some posts mentioned that Solr can index 100-150 docs/s and the comparison between embedded solr and HTTP. I've tried to do the indexing with 1.7+ million docs, each doc has 30 fields among which 10 fields are indexed/stored and the rest are only stored. The result was pretty impressive, it took approx 1.4 hour to finish. Noted that, the docs were sent synchronously, one after the other. The solr server and client were both running on Pentium Dual Core 3.2, 2G Ram, Ubuntu Feisty. The only issue I noticed is that, Solr does occupy some amount of memory. In the first run, after indexing around 500 thousands docs, it threw OutOfMemory exception. In the second trial, I setup -Xms and -Xmx for the JVM to run on 1G memory, Solr performed till the finish. Some questions 1) Is it a good practice to allow Solr indexing docs in real time (millions docs per day)? What I'm worry is that, Solr may eat up the memory as it goes. 2) If docs are sent asynchronously, how well could Solr can index? Any comments are highly appriciated Trung -- View this message in context: http://www.nabble.com/Indexing-Speed-tf4464036.html#a12728679 Sent from the Solr - User mailing list archive at Nabble.com.