hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jiang licht <licht_ji...@yahoo.com>
Subject Unexpected empty result problem (zero-sized part-### files)?
Date Sun, 21 Feb 2010 00:41:32 GMT
I have a pig script as follows (see far below). It loads 2 data sets, perform some filtering,
then join the two sets. Lastly count occurrences of a combination of fields and writes results
to hdfs.

--load raw data

a = LOAD 'foldera/*';



b = LOAD 'somefile';



--choose rows and columns

a_filtered = FILTER a BY somecondition;



a_filtered_shortened = FOREACH a_filtered GENERATE somefields;



a_filtered_shortened_unique = DISTINCT a_filtered_short PARALLEL #;



--join a & b and count occurrences of a combination of fields

ab = JOIN a_filtered_short_unique BY somefield, b by somefield PARALLEL 
#;



ab_shortened = FOREACH ab GENERATE somefileds;



ab_shortened_grouped = GROUP ab_shortened BY ($0, $1) PARALLEL #;



--c will contain: fields, counts

c = FOREACH ab_shortened_grouped GENERATE FLATTEN($0), 
COUNT(ab_shortened);



--save results

STORE c INTO 'MYRESULTS' USING PigStorage();

PROBLEM is that empty sets (empty part-#### files) were generated. But a non-empty result
is expected. For example, if I chose to load one file (instead of loading all files in a folder)
to 'a', quite a number of tuples are created (non-empty part-### files).

It seems to me the logic in the script is good and it generates correct result for randomly
selected file anyway. So, I am wondering what could cause this empty result problem?

FYI, I ran the same script multiple time and all gave me empty part-### files. Though in the
output, I did see repeatedly error message similar to the following ones that show one result
file is failed to produce (these are last lines from job output). Could this be the problem?
How to locate the cause? Thanks!

...
2010-02-20 16:21:37,737 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 86% complete
2010-02-20 16:21:38,239 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 87% complete
2010-02-20 16:21:39,265 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 88% complete
2010-02-20 16:21:44,286 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 93% complete
2010-02-20 16:21:46,931 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 95% complete
2010-02-20 16:21:47,432 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 99% complete
2010-02-20 16:21:54,005 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2010-02-20 16:21:54,005 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map reduce job(s) failed!
2010-02-20 16:21:54,008 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed to produce result in: "hdfs://hostA:50001/tmp/temp829697187/tmp-531977953"
2010-02-20 16:21:54,008 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Successfully stored result in: "hdfs://hostA:50001/tmp/temp829697187/tmp504533728"
2010-02-20 16:21:54,023 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Successfully stored result in: "hdfs://hostA:50001/user/root/MYRESULTS"
2010-02-20 16:21:54,056 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written : 0
2010-02-20 16:21:54,056 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written : 0
2010-02-20 16:21:54,056 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Some jobs have failed!



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message