crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Crunch and HCatalog
Date Thu, 23 Jan 2014 17:00:44 GMT
Hey Chao,

I wrote one for Cloudera ML. Here's the Source:

https://github.com/cloudera/ml/blob/master/hcatalog/src/main/java/com/cloudera/science/ml/hcatalog/HCatalogSource.java

Couple of caveats:

1) I developed it against HCat 0.4; I think there are more modern versions
now,
2) I wrote the Source by hand b/c we didn't have support for providing
extra conf info on FileSourceImpl at the time,
3) I was using my own custom Record interface that wrapped HCatalog
records, Avro records, and CSV files, so that's the type of data provided
by the Source.

There are also a bunch of Hive utilities I wrote that I found useful for
working with Hive tables:

https://github.com/cloudera/ml/blob/master/hcatalog/src/main/java/com/cloudera/science/ml/hcatalog/HCatalog.java

My opinion at the time was that writing to a HCat output was sort of a pain
unless the table was already defined and you were just creating a new
partition of it, which didn't really apply to my use case, so I would just
write my regular outputs and then call the Hive APIs to create a table
around it.

Hope that helps-- good luck!

J


On Thu, Jan 23, 2014 at 5:35 AM, Chao Shi <stepinto@live.com> wrote:

> Hi all,
>
> One of our recent projects needs read and write from/to HCatalog. We
> currently use raw MR with HCatInputFormat/HCatOutputFormat shipped with
> HCatalog. Does anyone know if there is already a Crunch wrapper for it?
>
> Thanks,
> Chao
>

Mime
View raw message