pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thejas Nair <te...@yahoo-inc.com>
Subject Re: Specifying multiple input paths in LOAD command
Date Thu, 09 Jul 2009 12:28:21 GMT
>From my experience, the entries in {} have to be one dir name, it can't be a
path containing several dirs.
This does not work - LOAD '{/d1/abc/def/f1,/d1/abc/xyz/f1}'
This works - LOAD '/d1/abc/{def,xyz}/f1'

-Thejas


On 7/9/09 8:07 PM, "zjffdu" <zjffdu@gmail.com> wrote:

> You can use pattern to match the path:
> 
> For example: 
> 
> Raw1 = LOAD '{inputPath1,inputPath2,...}/*' using PigStorage('\t');
> 
> This will load all the data under inputPath1,inputPath2,...
> 
> This is a mechanism supported by hadoop internally.
> 
> 
> 
> -----Original Message-----
> From: Palleti, Pallavi [mailto:pallavi.palleti@corp.aol.com]
> Sent: 2009年7月8日 20:34
> To: pig-user@hadoop.apache.org
> Subject: Specifying multiple input paths in LOAD command
> 
> Hi all,
> 
>  
> 
> We have a facility in hadoop where we can specify multiple input paths.
> Does this exist in Pig? Essentially, Is it possible to specify multiple
> paths in load command? For example, I have n number of input paths which
> I need to load for processing. The only possibility that I can see right
> now is to use n variables using n load commands and do an union at the
> end. 
> 
> For ex:
> 
>  
> 
> Raw1 = LOAD '$inputPath1/*' using PigStorage('\t');
> 
> Raw2 = LOAD '$inputPath2/*' using PigStorage('\t');
> 
> .
> 
> .
> 
> .
> 
> .
> 
> Rawn = LOAD '$inputPathn/*' using PigStorage('\t');
> 
> Raw = UNION Raw1,Raw2,....RawN
> 
>  
> 
> Can anyone kindly let me know if there is a simpler way of doing it in
> single LOAD line or something like that?
> 
>  
> 
> Thanks
> 
> Pallavi
> 
>  
> 
>  
> 
>  
> 
> 


Mime
View raw message