hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Reading multiple files of a directory using a Single LOAD Command in PIG
Date Wed, 12 Jun 2013 03:15:09 GMT
Yes, you can do that - it will still apply the filter to the globbed results.

On Wed, Jun 12, 2013 at 3:45 AM, Mix Nin <pig.mixed@gmail.com> wrote:
> Hi,
>
> My mistake, I gave backward slashes and so was getting error. I gave
> forward slashes and it is working fine.
>
> Good to know that LOAD ignores filenames that begin with "_" or a period
> ".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
>  /Output/part-m*?
>
> Thanks
>
>
>
>
> On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <prash1784@gmail.com>wrote:
>
>> What is the error?
>>
>> The LoadFunc should be ignoring any filenames that begin with "_" or a
>> period "."
>> If you are trying to skip the _SUCCESS file, the loader you are using
>> (PigStorage) already handles that.
>>
>> Also, can you double check your path is not "/Output/part-m* as opposed to
>> backward slashes?
>>
>>
>> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pig.mixed@gmail.com> wrote:
>>
>> > I have a directory "Output2. It has file names as below
>> >
>> > -----------------
>> > _SUCCESS
>> > part-m-00000
>> > part-m-00001
>> > part-m-00002
>> > part-m-00003
>> > .
>> > .
>> > .
>> > .
>> > part-m-00100
>> > -----------------
>> >
>> > The above files are produced by PIG output STORE command .
>> >
>> > I want to read the files starting with "part-m-" using PIG command
>> >
>> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
>> > It does not work and it throws error.
>> >
>> > How do I read these files in a single LOAD statement?
>> >
>> > Thanks
>> >
>> >
>>



-- 
Harsh J

Mime
View raw message