Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 21463 invoked from network); 8 Jul 2010 18:43:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Jul 2010 18:43:39 -0000 Received: (qmail 54358 invoked by uid 500); 8 Jul 2010 18:43:38 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 54301 invoked by uid 500); 8 Jul 2010 18:43:38 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 54293 invoked by uid 99); 8 Jul 2010 18:43:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Jul 2010 18:43:38 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.160.176] (HELO mail-gy0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Jul 2010 18:43:30 +0000 Received: by gyg10 with SMTP id 10so982762gyg.35 for ; Thu, 08 Jul 2010 11:42:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.151.88.1 with SMTP id q1mr743154ybl.281.1278614529908; Thu, 08 Jul 2010 11:42:09 -0700 (PDT) Received: by 10.231.17.199 with HTTP; Thu, 8 Jul 2010 11:42:09 -0700 (PDT) In-Reply-To: <201007081344.o68DixvD009948@post.webmailer.de> References: <201007081344.o68DixvD009948@post.webmailer.de> Date: Thu, 8 Jul 2010 11:42:09 -0700 Message-ID: Subject: Re: SequenceFile as map input From: Alex Loddengaard To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd7060cdc2247048ae4a438 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd7060cdc2247048ae4a438 Content-Type: text/plain; charset=ISO-8859-1 Hi Alan, SequenceFiles keep track of the key and value type, so you should be able to use the Writables in the signature. Though it looks like you're using the new API, and I admit that I'm not an expert with the new API. Have you tried using the Writables in the signature? Alex On Thu, Jul 8, 2010 at 6:44 AM, Some Body wrote: > To get around the small-file-problem (I have thousands of 2MB log files) I > wrote > a class to convert all my log files into a single SequenceFile in > (Text key, BytesWritable value) format. That works fine. I can run this: > > hadoop fs -text /my.seq |grep peemt114.log | head -1 > 10/07/08 15:02:10 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 10/07/08 15:02:10 INFO zlib.ZlibFactory: Successfully loaded & > initialized native-zlib library > 10/07/08 15:02:10 INFO compress.CodecPool: Got brand-new decompressor > peemt114.log 70 65 65 6d 74 31 31 34 09 .........[snip]....... > > which shows my file name key (peemt114.log) > and file contents value which appears to be converted to hex. > The hex values up to the first tab (09) translate to my hostname. > > I'm trying to adapt my mapper to use the SequenceFile as input. > > I changed the job's inputFormatClass to: > MyJob.setInputFormatClass(SequenceFileInputFormat.class); > and modified my mapper signature to: > public class MyMapper extends Mapper { > > but how do I convert the value back to Text? When I print out the > key,values using: > System.out.printf("MAPPER INKEY: [%s]\n", key); > System.out.printf("MAPPER INVAL: [%s]\n", value.toString()); > I get:: > MAPPER INKEY: [peemt114.log] > MAPPER INVAL: [70 65 65 6d 74 31 31 34 09 .....[snip]......] > > Alan > --000e0cd7060cdc2247048ae4a438 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Alan,

SequenceFiles keep track of the key and value t= ype, so you should be able to use the Writables in the signature. =A0Though= it looks like you're using the new API, and I admit that I'm not a= n expert with the new API. =A0Have you tried using the Writables in the sig= nature?

Alex

On Thu, Jul 8, 2010 at 6:44 AM, = Some Body <somebody@squareplanet.de> wrote:
To get around the small-file-problem (I have thousands of 2MB log files) I = wrote
a class to convert all my log files into a single SequenceFile in
(Text key, =A0BytesWritable value) format. =A0That works fine. I can run th= is:

=A0 =A0hadoop fs -text /my.seq |grep peemt114.log | head -1
=A0 =A010/07/08 15:02:10 INFO util.NativeCodeLoader: Loaded the native-had= oop library
=A0 =A010/07/08 15:02:10 INFO zlib.ZlibFactory: Successfully loaded & = initialized native-zlib library
=A0 =A010/07/08 15:02:10 INFO compress.CodecPool: Got brand-new decompress= or
=A0 =A0peemt114.log =A0 =A070 65 65 6d 74 31 31 34 09 .........[snip].....= ..

which shows my file name key (peemt114.log)
and file contents value which appears to be converted to hex.
The hex values up to the first tab (09) =A0translate to my hostname.

I'm trying to adapt my mapper to use the SequenceFile as input.

I =A0changed the job's inputFormatClass to:
=A0 =A0MyJob.setInputFormatClass(SequenceFileInputFormat.class);
and modified my mapper signature to:
=A0 public class MyMapper extends Mapper<Object, BytesWritable, Text, T= ext> {

but how do I convert the value back to Text? When I print out the key,value= s using:
=A0 =A0 =A0 =A0System.out.printf("MAPPER INKEY: [%s]\n", key); =A0 =A0 =A0 =A0System.out.printf("MAPPER INVAL: [%s]\n", value.t= oString());
I get::
=A0 =A0MAPPER INKEY: [peemt114.log]
=A0 =A0MAPPER INVAL: [70 65 65 6d 74 31 31 34 09 .....[snip]......]

Alan

--000e0cd7060cdc2247048ae4a438--