Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFCEF223B for ; Wed, 27 Apr 2011 11:10:20 +0000 (UTC) Received: (qmail 63322 invoked by uid 500); 27 Apr 2011 11:10:18 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 63264 invoked by uid 500); 27 Apr 2011 11:10:18 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 63256 invoked by uid 99); 27 Apr 2011 11:10:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Apr 2011 11:10:18 +0000 X-ASF-Spam-Status: No, hits=-1.6 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [192.6.10.2] (HELO colossus.hpl.hp.com) (192.6.10.2) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Apr 2011 11:10:10 +0000 Received: from localhost (localhost [127.0.0.1]) by colossus.hpl.hp.com (Postfix) with ESMTP id 5DA241BA9ED for ; Wed, 27 Apr 2011 12:09:48 +0100 (BST) X-Virus-Scanned: Debian amavisd-new at hpl.hp.com Received: from colossus.hpl.hp.com ([127.0.0.1]) by localhost (colossus.hpl.hp.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 4pkCV-IjgLTo for ; Wed, 27 Apr 2011 12:09:47 +0100 (BST) Received: from 0-imap-br1.hpl.hp.com (0-imap-br1.hpl.hp.com [16.25.144.60]) by colossus.hpl.hp.com (Postfix) with ESMTP id 9BF141BA9E8 for ; Wed, 27 Apr 2011 12:09:47 +0100 (BST) MailScanner-NULL-Check: 1304507375.64077@zp2TNzIG1HHtBB3LPVv/wg Received: from [16.25.175.158] (morzine.hpl.hp.com [16.25.175.158]) by 0-imap-br1.hpl.hp.com (8.14.1/8.13.4) with ESMTP id p3RB9Zrf011938 for ; Wed, 27 Apr 2011 12:09:35 +0100 (BST) Message-ID: <4DB7F96F.3040307@apache.org> Date: Wed, 27 Apr 2011 12:09:35 +0100 From: Steve Loughran User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8 MIME-Version: 1.0 To: common-user@hadoop.apache.org Subject: Re: Unsplittable files on HDFS References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-HPL-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: p3RB9Zrf011938 X-HPL-MailScanner: Found to be clean X-HPL-MailScanner-From: stevel@apache.org On 27/04/11 10:48, Niels Basjes wrote: > Hi, > > I did the following with a 1.6GB file > hadoop fs -Ddfs.block.size=2147483648 -put > /home/nbasjes/access-2010-11-29.log.gz /user/nbasjes > and I got > > Total number of blocks: 1 > 4189183682512190568: 10.10.138.61:50010 10.10.138.62:50010 > > Yes, that does the trick. Thank you. > > Niels > > 2011/4/27 Harsh J: >> Hey Niels, >> >> The block size is a per-file property. Would putting/creating these >> gzip files on the DFS with a very high block size (such that it >> doesn't split across for such files) be a valid solution to your >> problem here? >> Don't set a block size >2GB, not all the bits of the code that use signed 32 bit integers have been eliminated yet.