Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 93F0071B0 for ; Fri, 30 Sep 2011 12:40:47 +0000 (UTC) Received: (qmail 80162 invoked by uid 500); 30 Sep 2011 12:40:47 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 80076 invoked by uid 500); 30 Sep 2011 12:40:47 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 80035 invoked by uid 99); 30 Sep 2011 12:40:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Sep 2011 12:40:47 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of khanuniverse1@gmail.com designates 209.85.213.176 as permitted sender) Received: from [209.85.213.176] (HELO mail-yx0-f176.google.com) (209.85.213.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Sep 2011 12:40:42 +0000 Received: by yxn22 with SMTP id 22so2159849yxn.35 for ; Fri, 30 Sep 2011 05:40:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Jm59SVS+SiKAmTkJAVFNusnqIj44eIUHionioPy3wR4=; b=pMW95a8serDxXY/7NFafZCVmiDB1MeEZIRDBzTVsnv/wJ/PrONuMtW4s/wFc8L5HbY Jb87uNY1aGUycuLD5YoPQPIOj2oBGe9ILqh4lGjFasb5doNkBGY/EoZ7jWReorXLZqea L7u2HJqOeV2sWaYgQiahMujKOaf9r5pP7x8Y4= MIME-Version: 1.0 Received: by 10.68.15.71 with SMTP id v7mr59582071pbc.11.1317386420880; Fri, 30 Sep 2011 05:40:20 -0700 (PDT) Received: by 10.143.18.16 with HTTP; Fri, 30 Sep 2011 05:40:20 -0700 (PDT) In-Reply-To: References: <1317096850.92932.YahooMailNeo@web130122.mail.mud.yahoo.com> <1317145043.75569.YahooMailNeo@web130111.mail.mud.yahoo.com> <1317146130.49921.YahooMailNeo@web130113.mail.mud.yahoo.com> Date: Fri, 30 Sep 2011 15:40:20 +0300 Message-ID: Subject: Re: SOLR Index Speed From: Lord Khan Han To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=bcaec520e997a603bb04ae27edeb --bcaec520e997a603bb04ae27edeb Content-Type: text/plain; charset=ISO-8859-1 Any idea ? On Thu, Sep 29, 2011 at 1:53 PM, Lord Khan Han wrote: > Hi, > > The no-op run completed in 20 minutes. The only commented line was > "solr.addBean(doc)" We've tried SUSS as a drop in replacement for > CommonsHttpSolrServer but it's behavior was weird. We have seen 10Ks of > seconds for updates and it continues for a very long time after sending to > solr is complete. We thought that it was because we are indexing POJOS as > documents. BTW, SOLR-1565 and SOLR-2755 says that SUSS does not support > binary payload. > > > CommonsHttpSolrServer solr = new CommonsHttpSolrServer(url); > > solr.setRequestWriter(new BinaryRequestWriter()); > > ... > > // doc is a solrj annotated POJO > > solr.addBean(doc) > > > Any thoughts what may be taking too long? Before mapreduce we were indexing > in 2-3 hours to localhost using the same code base. > > On Tue, Sep 27, 2011 at 8:55 PM, Otis Gospodnetic < > otis_gospodnetic@yahoo.com> wrote: > >> Hello, >> >> By the way, should you need help with Hadoop+Solr, please feel free to get >> in touch with us at Sematext (see below) - we happen to work with Hadoop and >> Solr on a daily basis and have successfully implemented parallel indexing >> into Solr with/from Hadoop. >> >> Otis >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase >> Lucene ecosystem search :: http://search-lucene.com/ >> >> >> ------------------------------ >> *From:* Otis Gospodnetic >> *To:* "solr-user@lucene.apache.org" >> *Sent:* Tuesday, September 27, 2011 1:37 PM >> >> *Subject:* Re: SOLR Index Speed >> >> Hi, >> >> No need to use reply-all and CC me directly, I'm on the list :) >> >> It sounds like Solr is not the problem, but the Hadoop side. For example, >> what if you change your reducer not to call Solr but do some no-op. Does it >> go beyond 500-700 docs/minute? >> >> Otis >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase >> Lucene ecosystem search :: http://search-lucene.com/ >> >> >> >> >________________________________ >> >From: Lord Khan Han >> >To: solr-user@lucene.apache.org; Otis Gospodnetic < >> otis_gospodnetic@yahoo.com> >> >Sent: Tuesday, September 27, 2011 4:42 AM >> >Subject: Re: SOLR Index Speed >> > >> >Our producer (hadoop mapper prepare the docs for submitting and the >> reducer >> >diriectly submit from solrj http submit..) now 32 reducer but still the >> >indexing speed 500 - 700 doc per minute. submission coming from a hadoop >> >cluster so submit speed is not a problem. I couldnt use the full solr >> index >> >machine resources. >> > >> >I gave 12 gig heap to solr and machine is not swapping. >> > >> >I couldnt figure out the problem if there is.. >> > >> >PS: We are committing at the end of the submit. >> > >> > >> >On Tue, Sep 27, 2011 at 11:37 AM, Lord Khan Han > >wrote: >> > >> >> Sorry :) it is not 500 doc per sec. ( It is what i wish I think) It >> is >> >> 500 doc per MINUTE.. >> >> >> >> >> >> >> >> On Tue, Sep 27, 2011 at 7:14 AM, Otis Gospodnetic < >> >> otis_gospodnetic@yahoo.com> wrote: >> >> >> >>> Hello, >> >>> >> >>> > PS: solr streamindex is not option because we need to submit >> javabin... >> >>> >> >>> >> >>> If you are referring to StreamingUpdateSolrServer, then the above >> >>> statement makes no sense and you should give SUSS a try. >> >>> >> >>> Are you sure your 16 reducers produce more than 500 docs/second? >> >>> I think somebody already suggested increasing the number of reducers >> to >> >>> ~32. >> >>> What happens to your CPU load and indexing speed then? >> >>> >> >>> >> >>> Otis >> >>> ---- >> >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >> >>> Lucene ecosystem search :: http://search-lucene.com/ >> >>> >> >>> >> >>> >________________________________ >> >>> >From: Lord Khan Han >> >>> >To: solr-user@lucene.apache.org >> >>> >Sent: Monday, September 26, 2011 7:09 AM >> >>> >Subject: SOLR Index Speed >> >>> > >> >>> >Hi, >> >>> > >> >>> >We have 500K web document and usind solr (trunk) to index it. We have >> >>> >special anaylizer which little bit heavy cpu . >> >>> >Our machine config: >> >>> > >> >>> >32 x cpu >> >>> >32 gig ram >> >>> >SAS HD >> >>> > >> >>> >We are sending document with 16 reduce client (from hadoop) to the >> stand >> >>> >alone solr server. the problem is we couldnt get speedier than the >> 500 >> >>> doc / >> >>> >per sec. 500K document tooks 7-8 hours to index :( >> >>> > >> >>> >While indexin the the solr server cpu load is around : 5-6 (32 max) >> it >> >>> >means %20 of the cpu total power. We have plenty ram ... >> >>> > >> >>> >I turned of auto commit and give 8198 rambuffer .. there is no io >> wait >> >>> .. >> >>> > >> >>> >How can I make it faster ? >> >>> > >> >>> >PS: solr streamindex is not option because we need to submit >> javabin... >> >>> > >> >>> >thanks.. >> >>> > >> >>> > >> >>> > >> >>> >> >> >> >> >> > >> > >> > >> >> > --bcaec520e997a603bb04ae27edeb--