crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Reading multiple avro files in a single statement
Date Thu, 10 Jul 2014 15:31:54 GMT
Hey Samik,

Glob syntax should work in Crunch as well:

Pipeline p = …;
PCollection<MyAvroRecords> = p.read(From.avroFile(
'/raw/idm/events/year=2014/month=04/day=07/*/*/*.avro',
Avros.specifics(MyAvroRecords.class)));

J


On Thu, Jul 10, 2014 at 8:18 AM, Som Satpathy <somsatpathy@gmail.com> wrote:

> Hi Samik,
>
> You can create an AvroFileSource using org.apache.crunch.io.avro's
> AvroFileSource(List<Path> paths, AvroType<T> ptype) API, then read source
> in the pipeline.
>
> Hope this helps.
>
> Thanks,
> Som
>
>
> On Thu, Jul 10, 2014 at 2:12 AM, Samik Raychaudhuri <samikr@gmail.com>
> wrote:
>
>>  Hi,
>>
>> I am a Crunch newbie trying out few things. I have a quick question
>> inspired by a pig syntax. The following glob-like syntax works in pig for
>> loading multiple avro files:
>>
>> A = LOAD '/raw/idm/events/year=2014/month=04/day=07/*/*/*.avro' using
>> LOAD_IDM;
>>
>> I am wondering if there is something similar in Crunch API that would do
>> this.
>>
>> Regards.
>>
>
>

Mime
View raw message