Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80A8D10653 for ; Mon, 16 Dec 2013 00:29:01 +0000 (UTC) Received: (qmail 23951 invoked by uid 500); 16 Dec 2013 00:28:56 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 23857 invoked by uid 500); 16 Dec 2013 00:28:56 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 23849 invoked by uid 99); 16 Dec 2013 00:28:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Dec 2013 00:28:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jiayu.ji@gmail.com designates 74.125.82.53 as permitted sender) Received: from [74.125.82.53] (HELO mail-wg0-f53.google.com) (74.125.82.53) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Dec 2013 00:28:49 +0000 Received: by mail-wg0-f53.google.com with SMTP id k14so3902185wgh.8 for ; Sun, 15 Dec 2013 16:28:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YhXzI43h0gwodr8ZrmI6DF1zWhzF2QM7uBstKwFycQA=; b=F5kwRi/D6zCg3yWOwdqtoTbwPC+Pv18tLL3BbZs+AUK5gMCK34X8/MzZaJXjeOiBuN NqXqGRt+uhbfDwAJLhpPPfXATk51868ld2FwLNmHvuZOWpdh6hGYwVjh/GmRsLPSGNCA VYJUqK/D6t5h9O2rvVB8VyjAnIHwRAz6i41OLyOumKuCf+pvvDAOjBe2vi3AkxUWr4ur MeZHJL3hTTzYWh9WuMeGjlSw3iDMlJWZvdFeXbChL3lPu514/hhyWBrqlzvAEfF1l+P7 ZwTk34OtPPF4Z8ZJXC5zpjXGwUxYK6xnhx6p6OrUhCQ3a90N6wWemVaEcGP5kDhmH6UZ CSAQ== MIME-Version: 1.0 X-Received: by 10.194.237.99 with SMTP id vb3mr11498418wjc.28.1387153708982; Sun, 15 Dec 2013 16:28:28 -0800 (PST) Received: by 10.216.10.3 with HTTP; Sun, 15 Dec 2013 16:28:28 -0800 (PST) In-Reply-To: References: Date: Sun, 15 Dec 2013 18:28:28 -0600 Message-ID: Subject: Re: How does mapreduce job determine the compress codec From: Jiayu Ji To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0149422412aa2d04ed9be4a9 X-Virus-Checked: Checked by ClamAV on apache.org --089e0149422412aa2d04ed9be4a9 Content-Type: text/plain; charset=ISO-8859-1 Thanks Tao. I know I can tell it is a lzo file based on the magic number. What I am curious is which class in hadoop used by the mapreduce job to determine the file compression algorithm. At the end of the day, I am trying to figure out whether all the inputs of a mapreduce job have to be compressed with the same algorithm. On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao wrote: > I suggest you download the lzo compressed file, no matter weather it has a > lzo extension as its file name, and open it in the form of hex bytes with > tools like UltraEdit, and have a look at its heading contents. > > > 2013/12/14 Jiayu Ji > >> Hi >> >> I am having this question on how does mapreduce job determine the >> compress codec on hdfs. From what I read on the definitive guide (page >> 86)," the CompressionCodecFactory provides a way of mapping a filename >> extension to a CompressionCodec using its getCodec() method". I did a test >> with a lzo compressed file without a lzo extension. However, the mapreduce >> job was still able to get the right codec. Does anyone know why? Thanks in >> advance. >> >> Jiayu >> > > -- Jiayu (James) Ji, Cell: (312)823-7393 --089e0149422412aa2d04ed9be4a9 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks Tao. I know I can tell it is a lzo file based on th= e magic number. What I am curious is which class in hadoop used by the mapr= educe job to determine the file compression algorithm. At the end of the da= y, I am trying to figure out whether all the inputs of a mapreduce job have= to be compressed with the same algorithm.=A0
--089e0149422412aa2d04ed9be4a9--