pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Birch (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-2707) Range globs do not work
Date Thu, 17 May 2012 19:52:07 GMT
Christian Birch created PIG-2707:
------------------------------------

             Summary: Range globs do not work
                 Key: PIG-2707
                 URL: https://issues.apache.org/jira/browse/PIG-2707
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.9.1
         Environment: Amazon Elastic MapReduce. Hadoop 0.20.205
            Reporter: Christian Birch
            Priority: Minor


Using e.g. 's3://foo/{14,15,16}' to load files works like a charm but neither 's3://foo/{14..16}'
nor 's3://foo/{14...16}' works (I am not sure if it is two or three dots since both fail).
Anyway, I'm getting errors like this when using ranges (no matter if it is two or three dots):

Failed Jobs:
JobId	Alias	Feature	Message	Outputs
N/A	A	MAP_ONLY	Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
Input Pattern s3://foo/{14...16} matches 0 files
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:282)
	at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:999)
	at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1016)
	at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:172)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:934)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:887)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:887)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:861)
	at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
	at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
	at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern s3://foo/{14...16}
matches 0 files
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:270)
	... 14 more
	hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1508748976,

Input(s):
Failed to read data from "s3://foo/{14...16}"

Output(s):
Failed to produce result in "hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1508748976"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
null

---

I would expect {14...16} to work just like {14,15,16}:

2012-05-17 18:29:59,098 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features
used in the script: UNKNOWN
2012-05-17 18:29:59,164 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2012-05-17 18:29:59,165 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-05-17 18:29:59,165 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2012-05-17 18:29:59,182 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script
settings are added to the job
2012-05-17 18:29:59,182 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-05-17 18:31:14,493 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2012-05-17 18:31:14,567 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2012-05-17 18:31:14,582 [Thread-30] INFO  org.apache.hadoop.mapred.JobClient - Default number
of map tasks: null
2012-05-17 18:31:14,583 [Thread-30] INFO  org.apache.hadoop.mapred.JobClient - Setting default
number of map tasks based on cluster size to : 8
2012-05-17 18:31:14,583 [Thread-30] INFO  org.apache.hadoop.mapred.JobClient - Default number
of reduce tasks: 0
2012-05-17 18:31:15,072 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2012-05-17 18:31:16,870 [Thread-30] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat
- Total input paths to process : 1
2012-05-17 18:31:16,870 [Thread-30] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil
- Total input paths (combined) to process : 1
2012-05-17 18:31:18,099 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201205171523_0033
2012-05-17 18:31:18,099 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at: http://10.53.9.207:9100/jobdetails.jsp?jobid=job_201205171523_0033
2012-05-17 18:31:58,609 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2012-05-17 18:32:08,186 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2012-05-17 18:32:08,187 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script
Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
0.20.205	0.9.1-amzn	hadoop	2012-05-17 18:29:59	2012-05-17 18:32:08	UNKNOWN

Success!

Job Stats (time in seconds):
JobId	Maps	Reduces	MaxMapTime	MinMapTIme	AvgMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime
Alias	Feature	Outputs
job_201205171523_0033	1	0	12	12	12	0	0	0	A	MAP_ONLY	hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1447928118,

Input(s):
Successfully read 3 records (410 bytes) from: "s3://foo/{14,15,16}"

Output(s):
Successfully stored 3 records (1405 bytes) in: "hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1447928118"

Counters:
Total records written : 3
Total bytes written : 1405
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201205171523_0033

---

I am not sure if this is a Pig/Hadoop-issue or an Amazon EMR/S3-issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message