Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 43870 invoked from network); 19 Mar 2008 10:04:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Mar 2008 10:04:28 -0000 Received: (qmail 68258 invoked by uid 500); 19 Mar 2008 10:04:18 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 68213 invoked by uid 500); 19 Mar 2008 10:04:18 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 68204 invoked by uid 99); 19 Mar 2008 10:04:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Mar 2008 03:04:17 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [81.169.154.44] (HELO heaven.kostyrka.org) (81.169.154.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Mar 2008 10:03:28 +0000 Received: from localhost (localhost [127.0.0.1]) by heaven.kostyrka.org (Postfix) with ESMTP id 043384FD21 for ; Wed, 19 Mar 2008 11:03:48 +0100 (CET) Received: from heaven.kostyrka.org ([127.0.0.1]) by localhost (heaven.kostyrka.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 13635-01 for ; Wed, 19 Mar 2008 11:03:47 +0100 (CET) Received: from [192.168.2.33] (unknown [85.127.91.39]) by heaven.kostyrka.org (Postfix) with ESMTP id 397F64FB05 for ; Wed, 19 Mar 2008 11:03:47 +0100 (CET) Subject: Re: streaming problem From: Andreas Kostyrka To: core-user@hadoop.apache.org In-Reply-To: <1205914725.13580.16.camel@localhost> References: <1205875076.13580.7.camel@localhost> <1205875186.13580.9.camel@localhost> <47E099F2.7040809@yahoo-inc.com> <1205914725.13580.16.camel@localhost> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-XpVYO7qN1TN2EyKX58Bp" Date: Wed, 19 Mar 2008 11:03:48 +0100 Message-Id: <1205921028.13580.19.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.12.3 X-Virus-Checked: Checked by ClamAV on apache.org --=-XpVYO7qN1TN2EyKX58Bp Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Ok, tracked it down. Seems like Hadoop Streaming "corrupts" the input files. Any way to force it to pass whole files to one-to-one mapper? TIA, Andreas Am Mittwoch, den 19.03.2008, 09:18 +0100 schrieb Andreas Kostyrka: > The /home/hadoop/dist/workloadmf script is available on all nodes. >=20 > But it missed one package to run correctly ;( >=20 > Anyway, I still have the problem, that running with > -reducer NONE, my output gets lost, it seems. Well, some of the > outputfiles contain a small number of output lines, but not many :( > (And the expected size of each output file was around 25MB or so :( ) >=20 > Ah the joys, >=20 > Andreas >=20 > Am Mittwoch, den 19.03.2008, 10:13 +0530 schrieb Amareshwari > Sriramadasu: > > Hi Andreas, > > Looks like your mapper is not available to the streaming jar. Where is= =20 > > your mapper script? Did you use distributed cache to distribute the map= per? > > You can use -file to make it part of=20 > > jar. or Use -cacheFile /dist/wordloadmf#workloadmf to distribute the=20 > > script. Distributing this way will add your script to the PATH. > >=20 > > So, now you command will be: > >=20 > > time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapp= er workloadmf -reducer NONE -input testlogs/* -output testlogs-output -cach= eFile /dist/wordloadmf#workloadmf > >=20 > > or > >=20 > > time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapp= er workloadmf -reducer NONE -input testlogs/* -output testlogs-output -file= > >=20 > > Thanks, > > Amareshwari > >=20 > > Andreas Kostyrka wrote: > > > Some additional details if it's helping, the HDFS is hosted on AWS S3= , > > > and the input file set consists of 152 gzipped Apache log files. > > > > > > Thanks, > > > > > > Andreas > > > > > > Am Dienstag, den 18.03.2008, 22:17 +0100 schrieb Andreas Kostyrka: > > > =20 > > >> Hi! > > >> > > >> I'm trying to run a streaming job on Hadoop 1.16.0, I've distributed= the > > >> scripts to be used to all nodes: > > >> > > >> time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -m= apper ~/dist/workloadmf -reducer NONE -input testlogs/* -output testlogs-ou= tput > > >> > > >> Now, this gives me: > > >> > > >> java.io.IOException: log:null > > >> R/W/S=3D1/0/0 in:0=3D1/2 [rec/s] out:0=3D0/2 [rec/s] > > >> minRecWrittenToEnableSkip_=3D9223372036854775807 LOGNAME=3Dnull > > >> HOST=3Dnull > > >> USER=3Dhadoop > > >> HADOOP_USER=3Dnull > > >> last Hadoop input: |null| > > >> last tool output: |null| > > >> Date: Tue Mar 18 21:06:13 GMT 2008 > > >> java.io.IOException: Broken pipe > > >> at java.io.FileOutputStream.writeBytes(Native Method) > > >> at java.io.FileOutputStream.write(FileOutputStream.java:260) > > >> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.ja= va:65) > > >> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123= ) > > >> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124= ) > > >> at java.io.DataOutputStream.flush(DataOutputStream.java:106) > > >> at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:96) > > >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) > > >> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java= :2071) > > >> > > >> > > >> at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:107) > > >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) > > >> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java= :2071) > > >> > > >> Any ideas what my problems could be? > > >> > > >> TIA, > > >> > > >> Andreas > > >> =20 --=-XpVYO7qN1TN2EyKX58Bp Content-Type: application/pgp-signature; name=signature.asc Content-Description: Dies ist ein digital signierter Nachrichtenteil -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQBH4OUEHJdudm4KnO0RAntfAKC9vzl8uRZAC4fZdrRFAlYmCd+TxACgkYaP C0P0ZxEuFgoSrh5RG6QQ8Fs= =wyh/ -----END PGP SIGNATURE----- --=-XpVYO7qN1TN2EyKX58Bp--