hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sichi <>
Subject Re: column names from Object Inspector in serialize() method of custom serde
Date Thu, 27 May 2010 01:29:23 GMT
Hey Ashutosh,

You're right, currently the target table column names come in via initialize in the Properties
parameter, e.g. props.getProperty(Constants.LIST_COLUMNS), whereas the object inspector gets
_col1, _col2, _col3.  (And of course, if you have a custom mapping string like HBase, then
that comes in through the initialize Properties parameter via your own private property name.)

I haven't looked into the details of why this is, but probably the object inspector references
an internally produced row from whatever was upstream (rather than being derived from the
target table itself, although the number of columns has to match).  I'm not sure this is a
bug per se, just something to be aware of.  In general, you should try to precompute any data
structures needed during initialize so that serialize can be as lean as possible, meaning
you probably don't want to be looking at the field names in there anyway.

Opinions from other hive devs?


On May 21, 2010, at 12:22 PM, Ashutosh Chauhan wrote:

> Hi,
> I am writing my own custom serde to write data to an external table.
> In serialize() method of my serde I am handed over an object and an
> object Inspector. Since this object represents a row, I make an
> assumption that object Inspector is of type StructObjectInspector and
> then I get fields out of this struct using struct Object inspector.
> When I do field.getFieldName() on it I expect it will give me the real
> column name as contained in my table schema in metastore. But, instead
> I get names like _col1, _col2, _col3 ..
> Now the workaround for it is to store the column names in a list in
> initialize() method and then use that list to get names in
> serialize(). This is what I am doing now and it works. It seems hbase
> serde is also doing similar thing. But, it was counter intuitive to me
> not to expect to get the real column names in getFieldName() but
> rather some random made up names. If this is not the expected behavior
> then potentially I am doing something wrong in my serde.. if so I will
> appreciate if some one confirms that.. But if this is how things are
> implemented currently.. then I think its a bug and I will open a jira
> for it..
> Thanks,
> Ashutosh
> PS: I am posting it on dev-list But if folks think its more
> appropriate for user-list, feel free to move it there, while replying
> to it.

View raw message