Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 36384 invoked from network); 12 Dec 2007 21:58:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Dec 2007 21:58:11 -0000 Received: (qmail 81898 invoked by uid 500); 12 Dec 2007 21:57:58 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 81874 invoked by uid 500); 12 Dec 2007 21:57:57 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 81865 invoked by uid 99); 12 Dec 2007 21:57:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Dec 2007 13:57:57 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [76.96.30.48] (HELO QMTA05.emeryville.ca.mail.comcast.net) (76.96.30.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Dec 2007 21:57:57 +0000 Received: from OMTA07.emeryville.ca.mail.comcast.net ([76.96.30.59]) by QMTA05.emeryville.ca.mail.comcast.net with comcast id PvCB1Y00Q1GXsuc0A0BL00; Wed, 12 Dec 2007 21:57:35 +0000 Received: from [192.168.168.15] ([76.103.181.218]) by OMTA07.emeryville.ca.mail.comcast.net with comcast id Pxxb1Y0024j7bz80800000; Wed, 12 Dec 2007 21:57:35 +0000 X-Authority-Analysis: v=1.0 c=1 a=jKI4otXEkw4A:10 a=7OIMBqsLrPuT2YzrFUgA:9 a=vszgeDKbQv4OZc0qv0eN1mGVty0A:4 a=Mz_smNXqyOQA:10 Message-ID: <47605949.8040402@apache.org> Date: Wed, 12 Dec 2007 13:57:29 -0800 From: Doug Cutting User-Agent: Thunderbird 2.0.0.6 (X11/20071022) MIME-Version: 1.0 To: hadoop-user@lucene.apache.org Subject: Re: Question on Critical Region size for SequenceFile next/write - 0.15.1 References: <47605060.2070709@attributor.com> In-Reply-To: <47605060.2070709@attributor.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Jason Venner wrote: > On investigating, we discovered that the entirety of the next(key,value) > and the entirety of the write( key, value) are synchronized on the file > object. > > This causes all threads to back up on the serialization/deserialization. I'm not sure what you want to happen here. If you've got a bunch of threads writing to a single file, and that's your performance bottleneck, I don't see how to improve the situation except to write to multiple files on different drives, or to spread your load across a larger cluster (another way to get more drives). Doug