crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samik Raychaudhuri <sam...@gmail.com>
Subject Re: Reading multiple avro files in a single statement
Date Tue, 05 Aug 2014 11:34:57 GMT
Hi Josh,
Thanks - that worked. Did not try Som's method, but that would probably 
have worked as well.
Best.

On 10/07/2014 9:01 PM, Josh Wills wrote:
> Hey Samik,
>
> Glob syntax should work in Crunch as well:
>
> Pipeline p = …;
> PCollection<MyAvroRecords> = 
> p.read(From.avroFile('/raw/idm/events/year=2014/month=04/day=07/*/*/*.avro', 
> Avros.specifics(MyAvroRecords.class)));
>
> J
>
>
> On Thu, Jul 10, 2014 at 8:18 AM, Som Satpathy <somsatpathy@gmail.com 
> <mailto:somsatpathy@gmail.com>> wrote:
>
>     Hi Samik,
>
>     You can create an AvroFileSource using org.apache.crunch.io.avro's
>     AvroFileSource(List<Path> paths, AvroType<T> ptype) API, then read
>     source in the pipeline.
>
>     Hope this helps.
>
>     Thanks,
>     Som
>
>
>     On Thu, Jul 10, 2014 at 2:12 AM, Samik Raychaudhuri
>     <samikr@gmail.com <mailto:samikr@gmail.com>> wrote:
>
>         Hi,
>
>         I am a Crunch newbie trying out few things. I have a quick
>         question inspired by a pig syntax. The following glob-like
>         syntax works in pig for loading multiple avro files:
>
>         A = LOAD
>         '/raw/idm/events/year=2014/month=04/day=07/*/*/*.avro' using
>         LOAD_IDM;
>
>         I am wondering if there is something similar in Crunch API
>         that would do this.
>
>         Regards.
>
>
>


Mime
View raw message