incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <>
Subject Re: HCatOutputFormat schema issues
Date Tue, 01 Nov 2011 00:37:42 GMT
Hey Charles,

Yeah, you need to call setOutputSchema() on HCatOutputFormat explicitly.
Though we could assume defaults we don't because of the following reason.
While writing rows they may either contain partition columns or they may
not. HCatOutputFormat will transparently weed out partition columns if they
are present in the row. If we assume defaults then we have to assume that
data does not contain partition columns (we dont store partition columns in
data) which is a dangerous assumption to make which will screw things up
when we read back. So, instead we ask user to set the schema. You are also
correct order of columns should be same as the one you have declared while
creating tables.

Hope it helps,

On Mon, Oct 31, 2011 at 14:54, Charles Menguy <
> wrote:

> Hi,
> I've been playing with HCatalog for the past couple weeks now, and I have
> a few questions regarding schemas in MR jobs.
> From what I read in the documentation, schemas are optional, and if not
> specified it defaults to the table level schemas. Here are some extracts
> from the documentation:
> You can use the setOutputSchema method to include a projection schema, to
> specify specific output fields. If a schema is not specified, this default
> to the table level schema.
> The schema for the data being written out is specified by the setSchema method.
> If this is not called on the HCatOutputFormat, then by default it is
> assumed that the the partition has the same schema as the current table
> level schema
> Now when I try to omit the schema for HCatInputFormat, it works fine and
> assumes the default.
> But when I try to omit the schema for HCatOutputFormat, I get the
> following error: org.apache.hcatalog.common.HCatException : 9001 :
> Exception occurred while processing HCat request : It seems that
> setSchema() is not called on HCatOutputFormat. Please make sure that method
> is called.
> From what I read, it expects that I explicitely define the schema with
> HCatOutputFormat.setSchema(...), but this is exactly what I would like to
> omit to assume defaults.
> This is actually important because it seems that to define the schema, you
> have to know the order of your table columns in which you specify your
> List<HCatFieldSchema>, which may not always be obvious.
> Here is how I create my output table in Hive, which works fine when I'm
> manipulating it while specifying the schema:
> hive> create table inventory(word STRING, author STRING, frequency INT)
> stored as RCFILE;
> I would like to know if I'm doing something wrong, or if this is simply
> something not yet implemented in 0.2? Any thoughts would be useful.
> Thanks,
> Charles

View raw message