Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A2FF718401 for ; Wed, 6 Jan 2016 15:39:01 +0000 (UTC) Received: (qmail 34657 invoked by uid 500); 6 Jan 2016 15:38:56 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 34554 invoked by uid 500); 6 Jan 2016 15:38:56 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 34543 invoked by uid 99); 6 Jan 2016 15:38:55 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jan 2016 15:38:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 487A2C1561 for ; Wed, 6 Jan 2016 15:38:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id cWkS8zPrwlcQ for ; Wed, 6 Jan 2016 15:38:54 +0000 (UTC) Received: from mail-lb0-f170.google.com (mail-lb0-f170.google.com [209.85.217.170]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id CE7D8231B9 for ; Wed, 6 Jan 2016 15:38:53 +0000 (UTC) Received: by mail-lb0-f170.google.com with SMTP id pv2so208406259lbb.1 for ; Wed, 06 Jan 2016 07:38:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=fBBA8qMJ4o98ebZL/Ymu/Vt0waIqVn8dybOdEHUNEvE=; b=dWyKqKyVO4LIcrzMgTPU0uvnVOCSQQ1NqA7AwaTvRrC7hCyus8L9JGjobwMmynz1Hk qhDcL3S7p32fRW2heC4T81A4RHqDiQ3UGcI4jDqI9ED1gvqSATIXTCscHTqrPxm7cR9+ 6PKG/vGXvlSaOiEgZUdAi0T5I4HvwbfLT6+kivNEQognNbgEvCCnLEbJIKG4i8OTA5aA s9N23L77NMxUIKBEH4/80IsNTVNXMiqTJ/MdtRmV3ET1dibV9DHLMHKbvmoYzSvVPuWz 9U3bmURW/XsQUdq4mIW76nnFjIOR4uBnI1watsT6fsfPxKYZktMhMwgHYY/bmfXWZRXG N01Q== X-Received: by 10.112.148.33 with SMTP id tp1mr34038062lbb.52.1452094732149; Wed, 06 Jan 2016 07:38:52 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.26.75 with HTTP; Wed, 6 Jan 2016 07:38:22 -0800 (PST) From: xeon Mailinglist Date: Wed, 6 Jan 2016 15:38:22 +0000 Message-ID: Subject: How can I create a SequenceFiles with `org.apache.hadoop.io.Text`? To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=e89a8f234413b0d0b10528ac2610 --e89a8f234413b0d0b10528ac2610 Content-Type: text/plain; charset=UTF-8 Hi, This is a snippet of a Gridmix code available in Hadop MapReduce V1, but I have the following question. They set `org.apache.hadoop.mapred.SequenceFileInputFormat` and `org.apache.hadoop.mapred.SequenceFileOutputFormat` as the `inFormat` and `outFormat` respectively, and it also has `org.apache.hadoop.io.Text` as `outKey` and `outValue`. For me, this seems that this example accepts Text files as sequence files. How can I create a SequenceFiles with `org.apache.hadoop.io.Text`? ``` WEBDATASCAN("webdataScan") { public void addJob(int numReducers, boolean mapoutputCompressed, boolean outputCompressed, Size size, JobControl gridmix) { final String prop = String.format("webdataScan.%sJobs.inputFiles", size); final String indir = getInputDirsFor(prop, size.defaultPath(VARCOMPSEQ)); final String outdir = addTSSuffix("perf-out/webdata-scan-out-dir-" + size); StringBuffer sb = new StringBuffer(); sb.append("-keepmap 0.2 "); sb.append("-keepred 5 "); sb.append("-inFormat org.apache.hadoop.mapred.SequenceFileInputFormat "); sb.append("-outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat "); sb.append("-outKey org.apache.hadoop.io.Text "); sb.append("-outValue org.apache.hadoop.io.Text "); sb.append("-indir ").append(indir).append(" "); sb.append("-outdir ").append(outdir).append(" "); sb.append("-r ").append(numReducers); String[] args = sb.toString().split(" "); clearDir(outdir); try { JobConf jobconf = GenericMRLoadJobCreator.createJob( args, mapoutputCompressed, outputCompressed); jobconf.setJobName("GridmixWebdatascan." + size); Job job = new Job(jobconf); gridmix.addJob(job); } catch (Exception ex) { System.out.println(ex.getStackTrace()); } } } ``` --e89a8f234413b0d0b10528ac2610 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

This is a snippet of a Gridmix = code available in Hadop MapReduce V1, but I have the following question.
They set `org.apache.hadoop.mapred.SequenceFileInputFormat` and = `org.apache.hadoop.mapred.SequenceFileOutputFormat` as the `inFormat` and `= outFormat` respectively, and it also has `org.apache.hadoop.io.Text` as `ou= tKey` and `outValue`. For me, this seems that this example accepts Text fil= es as sequence files. How can I create a SequenceFiles with `org.apache.had= oop.io.Text`?


```
=C2=A0=C2=A0=C2=A0 WEBDATAS= CAN("webdataScan") {
=C2=A0=C2=A0=C2=A0 public void addJob(int= numReducers, boolean mapoutputCompressed,
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 boolean outputCompressed, Size size, JobControl gridmix) {<= br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final String prop =3D String.format("= ;webdataScan.%sJobs.inputFiles", size);
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 final String indir =3D getInputDirsFor(prop, size.defaultPath(VARCOMPSE= Q));
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final String outdir =3D addTSSuffix(= "perf-out/webdata-scan-out-dir-" + size);
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 StringBuffer sb =3D new StringBuffer();
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 sb.append("-keepmap 0.2 ");
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 sb.append("-keepred 5 ");
=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 sb.append("-inFormat org.apache.hadoop.mapred.SequenceFileInput= Format ");
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sb.append("-outForma= t org.apache.hadoop.mapred.SequenceFileOutputFormat ");
=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 sb.append("-outKey org.apache.hadoop.io.Text &qu= ot;);
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sb.append("-outValue org.apach= e.hadoop.io.Text ");
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sb.append("= ;-indir ").append(indir).append(" ");
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 sb.append("-outdir ").append(outdir).append(" &= quot;);
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sb.append("-r ").append= (numReducers);

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 String[] args =3D sb.t= oString().split(" ");
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 clearDir(= outdir);
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 try {
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 JobConf jobconf =3D GenericMRLoadJobCreator.createJob= (
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 arg= s, mapoutputCompressed, outputCompressed);
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 jobconf.setJobName("GridmixWebdatascan." + size);=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Job job =3D new Job(jobconf)= ;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 gridmix.addJob(job);
=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 } catch (Exception ex) {
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 System.out.println(ex.getStackTrace());
=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0 }<= br>```
--e89a8f234413b0d0b10528ac2610--