Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 47212 invoked from network); 20 Feb 2007 16:50:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Feb 2007 16:50:33 -0000 Received: (qmail 93842 invoked by uid 500); 20 Feb 2007 16:50:40 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 93826 invoked by uid 500); 20 Feb 2007 16:50:40 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 93817 invoked by uid 99); 20 Feb 2007 16:50:40 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Feb 2007 08:50:40 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [203.99.254.144] (HELO rsmtp2.corp.hki.yahoo.com) (203.99.254.144) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Feb 2007 08:50:29 -0800 Received: from ddaslaptop (vpn-client7.bangalore.corp.yahoo.com [10.80.52.7]) (authenticated bits=0) by rsmtp2.corp.hki.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l1KGo1k0092291 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Tue, 20 Feb 2007 08:50:04 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=from:to:references:subject:date:message-id:mime-version: content-type:content-transfer-encoding:x-mailer:x-mimeole:thread-index:in-reply-to; b=FYhmZmMKK4uCJLJ1OfObYRdG+xc27Rcpp2+lXxBk2glnpKWQyduv0rCSNUAIUq3W From: "Devaraj Das" To: "'Venkat Seeth'" , References: <01b301c754c4$84804330$2301a8c0@ds.corp.yahoo.com> <775189.76437.qm@web90505.mail.mud.yahoo.com> Subject: RE: Strange behavior - One reduce out of N reduces always fail. Date: Tue, 20 Feb 2007 22:20:01 +0530 Message-ID: <01ff01c7550f$2bad0c20$2301a8c0@ds.corp.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Thread-Index: AcdVC8KHzGy2czTOQUCuiMXzSRooUQAAsZ9w In-Reply-To: <775189.76437.qm@web90505.mail.mud.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org Hi Venkat, You forgot to paste the log output in your reply. The patch that I sent will log the key/value sizes in the Reducers as well. See if you get helpful hints with that. Thanks, Devaraj. > -----Original Message----- > From: Venkat Seeth [mailto:svejb@yahoo.com] > Sent: Tuesday, February 20, 2007 9:55 PM > To: hadoop-user@lucene.apache.org; Devaraj Das > Subject: RE: Strange behavior - One reduce out of N reduces always fail. > > Hi Devraj, > > Thanks for your response. > > > Do you have an estimate of the sizes? > # of entries:1080746 > [# of field-value Pairs] > min count:20 > max count:3116 > avg count:66 > > These are small documents and yes, full-text content > for each document can be big. I've also set the > MaxFieldLength to 10000 so that I dont index very > large values as suggested in Lucene. > > Always, the reduce fails while merging segments. I do > see a large line in Log4J output which consists of > > Typically, the job that fails is is ALWAYS VERY SLOW > as compared to other N - 1 jobs. > > Can I log the Key-Value pair sizes in the reduce part > of the indexer? > > Again, > > Thanks, > Venkat > > --- Devaraj Das wrote: > > > While this could be a JVM/GC issue as Andrez pointed > > out, it could also be > > due to a very large key/value being read from the > > map output. Do you have an > > estimate of the sizes? Attached is a > > quick-hack-patch to log the sizes of > > the key/values read from the sequence files. Please > > apply this patch on > > hadoop-0.11.2 and check the userlogs what key/value > > it is failing for (if at > > all).. > > Thanks, > > Devaraj. > > > > > -----Original Message----- > > > From: Venkat Seeth [mailto:svejb@yahoo.com] > > > Sent: Tuesday, February 20, 2007 11:32 AM > > > To: hadoop-user@lucene.apache.org > > > Subject: Strange behavior - One reduce out of N > > reduces always fail. > > > > > > Hi there, > > > > > > Howdy. I've been using hadoop to parse and index > > XML > > > documents. Its a 2 step process similar to Nutch. > > I > > > parse the XML and create field-value tuples > > written to > > > a file. > > > > > > I read this file and index the field-value pairs > > in > > > the next step. > > > > > > Everything works fine but always one reduce out of > > N > > > fails in the last step when merging segments. It > > fails > > > with one or more of the following: > > > - Task failed to report status for 608 seconds. > > > Killing. > > > - java.lang.OutOfMemoryError: GC overhead limit > > > exceeded > > > > > > I've tried various configuration combinations and > > it > > > fails always at the 4th one in a 8 reduce > > > configuration and the first one in a 4 reduce > > config. > > > > > > Environment: > > > Suse Linux 64 bit > > > Java 6 (Java 5 also fails) > > > Hadoop-0.11-2 > > > Lucene-2.1 (Lucene 2.0 also fails) > > > > > > Configuration: > > > I have about 128 maps and 8 reduces so I get to > > create > > > 8 partitions of my index. It runs on a 4 node > > cluster > > > with 4-Dual-proc 64GB machines. > > > > > > Number of documents: 1.65 million each about 10K > > in > > > size. > > > > > > I ran with 4 or 8 task trackers per node with 4 GB > > > Heap for Job, Task trackers and the child JVMs. > > > > > > mergeFactor set to 50 and maxBufferedDocs at 1000. > > > > > > I fail to understand whats going on. When I run > > the > > > job individually, it works with the same settings. > > > > > > Why would all jobs work where in only one fails. > > > > > > I'd appreciate if any one can share their > > experience. > > > > > > Thanks, > > > Ven > > > > > > > > > > > > > > > > > > __________________________________________________________________________ > > > __________ > > > Yahoo! Music Unlimited > > > Access over 1 million songs. > > > http://music.yahoo.com/unlimited > > > > > > > __________________________________________________________________________ > __________ > Cheap talk? > Check out Yahoo! Messenger's low PC-to-Phone call rates. > http://voice.yahoo.com