Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of ashook@clearedgeit.com
 designates 64.202.165.196 as permitted sender)
From: Adam Shook <ashook@clearedgeit.com>
To: "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org>
Date: Mon, 1 Aug 2011 17:19:07 -0400
Subject: Unusual large number of map tasks for a SequenceFile
Thread-Topic: Unusual large number of map tasks for a SequenceFile
Thread-Index: AcxQkKWykjuNnhu7QAa5/WbPvHVdZA==
Message-ID: 
 <2D3A1C35D7BA764A89D1B6166D213AB04FBA261F86@TINY.corp.clearedgeit.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: multipart/alternative;
	boundary="_000_2D3A1C35D7BA764A89D1B6166D213AB04FBA261F86TINYcorpclear_"
MIME-Version: 1.0

--_000_2D3A1C35D7BA764A89D1B6166D213AB04FBA261F86TINYcorpclear_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi All,

I am writing a sequence file to HDFS from an application as a pre-process t=
o a MapReduce job.  (It isn't being written from a MR job, just open, write=
, close)

The file is around 32 MBs in size.  When the MapReduce job starts up, it st=
arts with 256 map tasks.  I am writing SequenceFiles from this first job an=
d firing up a second with the first job's output.  The second job has aroun=
d 32KB of input with 138 map tasks.  There are 128 part files, so it should=
 only be 128 map tasks for this second job.  This seems to be an unusually =
large amount of map tasks since the cluster is configured to the default bl=
ock size of 64MB.  I am using Hadoop v0.20.1.

Is there something special about how the SequenceFiles are being written?  =
As far as how I am using to write the first file, below is a code sample.

Thanks,
Adam


FileSystem fs =3D FileSystem.get(new Configuration());
Writer wrtr =3D SequenceFile.createWriter(fs, fs.getConf(), <path_to_file>,=
 Text.class, Text.class);

for (String s1 : strings1) {
      for (String s2 : strings2) {
wrtr.append((new Text(s1), new Text(s2));
}
}

wrtr.close();

--_000_2D3A1C35D7BA764A89D1B6166D213AB04FBA261F86TINYcorpclear_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:x=3D"urn:schemas-microsoft-com:office:excel" xmlns:p=3D"urn:schemas-m=
icrosoft-com:office:powerpoint" xmlns:a=3D"urn:schemas-microsoft-com:office=
:access" xmlns:dt=3D"uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s=3D"=
uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs=3D"urn:schemas-microsof=
t-com:rowset" xmlns:z=3D"#RowsetSchema" xmlns:b=3D"urn:schemas-microsoft-co=
m:office:publisher" xmlns:ss=3D"urn:schemas-microsoft-com:office:spreadshee=
t" xmlns:c=3D"urn:schemas-microsoft-com:office:component:spreadsheet" xmlns=
:odc=3D"urn:schemas-microsoft-com:office:odc" xmlns:oa=3D"urn:schemas-micro=
soft-com:office:activation" xmlns:html=3D"http://www.w3.org/TR/REC-html40" =
xmlns:q=3D"http://schemas.xmlsoap.org/soap/envelope/" xmlns:rtc=3D"http://m=
icrosoft.com/officenet/conferencing" xmlns:D=3D"DAV:" xmlns:Repl=3D"http://=
schemas.microsoft.com/repl/" xmlns:mt=3D"http://schemas.microsoft.com/share=
point/soap/meetings/" xmlns:x2=3D"http://schemas.microsoft.com/office/excel=
/2003/xml" xmlns:ppda=3D"http://www.passport.com/NameSpace.xsd" xmlns:ois=
=3D"http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir=3D"http://=
schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds=3D"http://www.w3=
.org/2000/09/xmldsig#" xmlns:dsp=3D"http://schemas.microsoft.com/sharepoint=
/dsp" xmlns:udc=3D"http://schemas.microsoft.com/data/udc" xmlns:xsd=3D"http=
://www.w3.org/2001/XMLSchema" xmlns:sub=3D"http://schemas.microsoft.com/sha=
repoint/soap/2002/1/alerts/" xmlns:ec=3D"http://www.w3.org/2001/04/xmlenc#"=
 xmlns:sp=3D"http://schemas.microsoft.com/sharepoint/" xmlns:sps=3D"http://=
schemas.microsoft.com/sharepoint/soap/" xmlns:xsi=3D"http://www.w3.org/2001=
/XMLSchema-instance" xmlns:udcs=3D"http://schemas.microsoft.com/data/udc/so=
ap" xmlns:udcxf=3D"http://schemas.microsoft.com/data/udc/xmlfile" xmlns:udc=
p2p=3D"http://schemas.microsoft.com/data/udc/parttopart" xmlns:wf=3D"http:/=
/schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:dsss=3D"http://sche=
mas.microsoft.com/office/2006/digsig-setup" xmlns:dssi=3D"http://schemas.mi=
crosoft.com/office/2006/digsig" xmlns:mdssi=3D"http://schemas.openxmlformat=
s.org/package/2006/digital-signature" xmlns:mver=3D"http://schemas.openxmlf=
ormats.org/markup-compatibility/2006" xmlns:m=3D"http://schemas.microsoft.c=
om/office/2004/12/omml" xmlns:mrels=3D"http://schemas.openxmlformats.org/pa=
ckage/2006/relationships" xmlns:spwp=3D"http://microsoft.com/sharepoint/web=
partpages" xmlns:ex12t=3D"http://schemas.microsoft.com/exchange/services/20=
06/types" xmlns:ex12m=3D"http://schemas.microsoft.com/exchange/services/200=
6/messages" xmlns:pptsl=3D"http://schemas.microsoft.com/sharepoint/soap/Sli=
deLibrary/" xmlns:spsl=3D"http://microsoft.com/webservices/SharePointPortal=
Server/PublishedLinksService" xmlns:Z=3D"urn:schemas-microsoft-com:" xmlns:=
st=3D"" xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta http-equi=
v=3DContent-Type content=3D"text/html; charset=3Dus-ascii"><meta name=3DGen=
erator content=3D"Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:"Calibri","sans-serif";
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-US link=3Dblue vli=
nk=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal><span style=3D'f=
ont-size:10.0pt;font-family:"Courier New"'>Hi All,<o:p></o:p></span></p><p =
class=3DMsoNormal><span style=3D'font-size:10.0pt;font-family:"Courier New"=
'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span style=3D'font-size=
:10.0pt;font-family:"Courier New"'>I am writing a sequence file to HDFS fro=
m an application as a pre-process to a MapReduce job.&nbsp; (It isn&#8217;t=
 being written from a MR job, just open, write, close)<o:p></o:p></span></p=
><p class=3DMsoNormal><span style=3D'font-size:10.0pt;font-family:"Courier =
New"'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span style=3D'font-=
size:10.0pt;font-family:"Courier New"'>The file is around 32 MBs in size.&n=
bsp; When the MapReduce job starts up, it starts with 256 map tasks.&nbsp; =
I am writing SequenceFiles from this first job and firing up a second with =
the first job&#8217;s output.&nbsp; The second job has around 32KB of input=
 with 138 map tasks.&nbsp; There are 128 part files, so it should only be 1=
28 map tasks for this second job.&nbsp; This seems to be an unusually large=
 amount of map tasks since the cluster is configured to the default block s=
ize of 64MB.&nbsp; I am using Hadoop v0.20.1.<o:p></o:p></span></p><p class=
=3DMsoNormal><span style=3D'font-size:10.0pt;font-family:"Courier New"'><o:=
p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span style=3D'font-size:10.0=
pt;font-family:"Courier New"'>Is there something special about how the Sequ=
enceFiles are being written?&nbsp; As far as how I am using to write the fi=
rst file, below is a code sample.<o:p></o:p></span></p><p class=3DMsoNormal=
><span style=3D'font-size:10.0pt;font-family:"Courier New"'><o:p>&nbsp;</o:=
p></span></p><p class=3DMsoNormal><span style=3D'font-size:10.0pt;font-fami=
ly:"Courier New"'>Thanks,<o:p></o:p></span></p><p class=3DMsoNormal><span s=
tyle=3D'font-size:10.0pt;font-family:"Courier New"'>Adam<o:p></o:p></span><=
/p><p class=3DMsoNormal><span style=3D'font-size:10.0pt;font-family:"Courie=
r New"'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span style=3D'fon=
t-size:10.0pt;font-family:"Courier New"'><o:p>&nbsp;</o:p></span></p><p cla=
ss=3DMsoNormal><span style=3D'font-size:10.0pt;font-family:"Courier New"'>F=
ileSystem fs =3D FileSystem.get(new Configuration());<o:p></o:p></span></p>=
<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'font-size=
:10.0pt;font-family:"Courier New";color:black'>Writer wrtr =3D SequenceFile=
.<i><span style=3D'background:silver;mso-highlight:silver'>createWriter(fs,=
 fs.</span></i>getConf(), &lt;path_to_file&gt;, Text.class, Text.class);<o:=
p></o:p></span></p><p class=3DMsoNormal style=3D'text-autospace:none'><span=
 style=3D'font-size:10.0pt;font-family:"Courier New";color:black'>&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style=3D'font-size:10.0pt;fo=
nt-family:"Courier New"'><o:p></o:p></span></p><p class=3DMsoNormal style=
=3D'text-autospace:none'><span style=3D'font-size:10.0pt;font-family:"Couri=
er New";color:black'>for (String s1 : strings1) {<o:p></o:p></span></p><p c=
lass=3DMsoNormal style=3D'text-autospace:none'><span style=3D'font-size:10.=
0pt;font-family:"Courier New";color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; f=
or (String s2 : strings2) {</span><span style=3D'font-size:10.0pt;font-fami=
ly:"Courier New"'><o:p></o:p></span></p><p class=3DMsoNormal style=3D'margi=
n-left:.5in;text-indent:.5in;text-autospace:none'><span style=3D'font-size:=
10.0pt;font-family:"Courier New";color:black'>wrtr.append((new Text(s1), ne=
w Text(s2));<o:p></o:p></span></p><p class=3DMsoNormal style=3D'text-indent=
:.5in;text-autospace:none'><span style=3D'font-size:10.0pt;font-family:"Cou=
rier New";color:black'>}<o:p></o:p></span></p><p class=3DMsoNormal style=3D=
'text-autospace:none'><span style=3D'font-size:10.0pt;font-family:"Courier =
New";color:black'>}</span><span style=3D'font-size:10.0pt;font-family:"Cour=
ier New"'><o:p></o:p></span></p><p class=3DMsoNormal style=3D'text-autospac=
e:none'><span style=3D'font-size:10.0pt;font-family:"Courier New";color:bla=
ck'>&nbsp;&nbsp;&nbsp; </span><span style=3D'font-size:10.0pt;font-family:"=
Courier New"'><o:p></o:p></span></p><p class=3DMsoNormal><span style=3D'fon=
t-size:10.0pt;font-family:"Courier New";color:black'>wrtr.close();</span><s=
pan style=3D'font-size:10.0pt;font-family:"Courier New"'><o:p></o:p></span>=
</p></div></body></html>=

--_000_2D3A1C35D7BA764A89D1B6166D213AB04FBA261F86TINYcorpclear_--