Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 83847 invoked from network); 21 Jan 2011 23:43:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Jan 2011 23:43:41 -0000 Received: (qmail 38024 invoked by uid 500); 21 Jan 2011 23:43:38 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 37856 invoked by uid 500); 21 Jan 2011 23:43:38 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 37848 invoked by uid 99); 21 Jan 2011 23:43:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jan 2011 23:43:38 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.200.145.35] (HELO imta-38.everyone.net) (216.200.145.35) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jan 2011 23:43:31 +0000 Received: from pps.filterd (omta006 [127.0.0.1]) by imta-38.everyone.net (8.14.4/8.14.4) with SMTP id p0LNh0cH015888 for ; Fri, 21 Jan 2011 15:43:10 -0800 Received: from dm0207.mta.everyone.net (sj1-slb03-gw2.sj2.proofpoint.com [172.16.1.96]) by imta-38.everyone.net with ESMTP id tykmg85ue-1 for ; Fri, 21 Jan 2011 15:43:10 -0800 X-Eon-Dm: dm0207 Received: by dm0207.mta.everyone.net (EON-AUTHRELAY2[SSL] - 43a927d5) id dm0207.4d3867d3.2bf604 for ; Fri, 21 Jan 2011 15:43:10 -0800 X-Eon-Sig: AQG/prdNOhoOsasu/gIAAAAB,7fec91c30949ca98993e1b99b40713c6 Message-ID: <4D3A1A0D.2010604@yieldbuild.com> Date: Fri, 21 Jan 2011 15:43:09 -0800 From: Alan Malloy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 To: common-user@hadoop.apache.org Subject: Re: Losing Records with Block Compressed Sequence File References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 suspectscore=1 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=6.0.2-1012030000 definitions=main-1101210171 Make sure to close the output writer? I had similar problems in a different scenario and it turned out I was neglecting to close/flush my output. On 01/21/2011 01:04 PM, David Sinclair wrote: > Hi, I am seeing an odd problem when writing block compressed sequence files. > If I write 400,000 records into a sequence file w/o compression, all 400K > end up in the file. If I write with block, regardless if it is bz2 or > deflate, I start losing records. Not a ton, but a couple hundred. > > Here are the exact numbers > > bz2 399,734 > deflate 399,770 > none 400,000 > > Conf settings > io.file.buffer.size - 4K, io.seqfile.compress.blocksize - 1MB > > anyone ever see this behavior? > > thanks > > dave >