hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1576) Difference in Semantics between Load statement in Pig and HDFS client on Command line
Date Tue, 21 Sep 2010 18:00:49 GMT

     [ https://issues.apache.org/jira/browse/PIG-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Gates updated PIG-1576:
----------------------------

    Fix Version/s: 0.9.0

> Difference in Semantics between Load statement in Pig and HDFS client on Command line
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-1576
>                 URL: https://issues.apache.org/jira/browse/PIG-1576
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Viraj Bhat
>             Fix For: 0.9.0
>
>
> Here is my directory structure on HDFS which I want to access using Pig. 
> This is a sample, but in real use case I have more than 100 of these directories.
> {code}
> $ hadoop fs -ls /user/viraj/recursive/
> Found 3 items
> drwxr-xr-x   - viraj supergroup          0 2010-08-26 11:25 /user/viraj/recursive/20080615
> drwxr-xr-x   - viraj supergroup          0 2010-08-26 11:25 /user/viraj/recursive/20080616
> drwxr-xr-x   - viraj supergroup          0 2010-08-26 11:25 /user/viraj/recursive/20080617
> {code}
> Using the command line I am access them using variety of options:
> {code}
> $ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 /user/viraj/recursive/20080615/kv2.txt
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 /user/viraj/recursive/20080616/kv2.txt
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 /user/viraj/recursive/20080617/kv2.txt
> $ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 /user/viraj/recursive/20080615/kv2.txt
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 /user/viraj/recursive/20080616/kv2.txt
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 /user/viraj/recursive/20080617/kv2.txt
> {code}
> I have written a Pig script, all the below combination of load statements do not work?
> {code}
> --A = load '/user/viraj/recursive/{200806}{15..17}/' using PigStorage('\u0001') as (k:int,
v:chararray);
> A = load '/user/viraj/recursive/{20080615..20080617}/' using PigStorage('\u0001') as
(k:int, v:chararray);
> AL = limit A 10;
> dump AL;
> {code}
> I get the following error in Pig 0.8
> {noformat}
> 2010-08-27 16:34:27,704 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
reduce job(s) failed!
> 2010-08-27 16:34:27,711 [main] INFO  org.apache.pig.tools.pigstats.PigStats - Script
Statistics: 
> HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
> 0.20.2  0.8.0-SNAPSHOT  viraj   2010-08-27 16:34:24     2010-08-27 16:34:27     LIMIT
> Failed!
> Failed Jobs:
> JobId   Alias   Feature Message Outputs
> N/A     A,AL            Message: org.apache.pig.backend.executionengine.ExecException:
ERROR 2118: Unable to create input splits for: /user/viraj/recursive/{20080615..20080617}/
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
>         at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
>         at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>         at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>         at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>         at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern
hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617} matches 0 files
>         at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
>         at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268)
>         ... 7 more
>         hdfs://localhost:9000/tmp/temp241388470/tmp987803889,
> {noformat}
> The following works:
> {code}
> A = load '/user/viraj/recursive/{200806}{15,16,17}/' using PigStorage('\u0001') as (k:int,
v:chararray);
> AL = limit A 10;
> dump AL;
> {code}
> Why is there an inconsistency between HDFS client and Pig?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message