incubator-hcatalog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francis Christopher Liu <fc...@yahoo-inc.com>
Subject JobInfo, OutputJobInfo, HCatTableInfo,
Date Fri, 15 Jul 2011 22:09:49 GMT
Hi,

I’m part of a team that’s working on adding Hbase support to hcatalog. We’re just getting
our feet wet with the source code. And have some questions. Any help would be appreciated
to get things going.

As part of writing the storage drivers for Hbase we need to add a few more configuration parameters
(ie range of versions to read, version number to use when writing, etc). Since setInput/setOutput
takes in HCatTableInfo as a parameter. It would seem this is the right place to put it? Also
when adding parameters it wouldn’t be good design to put implementation specific parameters
into HCatTableInfo. So would it be better to subclass this class or add a Properties field
to store such information?

JobInfo seems to only be used for Input as OutputJobInfo is for output. Shouldn’t we rename
the class to InputJobInfo? Also JobInfo doesn’t have a reference to HCatTableInfo while
OutJobInfo does info does. Given this is the Hcat context used by the storage drivers shouldn’t
it be there?

As for the role of the classes it seems to me that it would make much more sense to have *JobInfo
passed as the parameter for setInput/setOutput. Looks to me, HCatTableInfo should contain
the state of things as persisted in the metastore while *JobInfo classes should contain the
job-specific information? We could have a factory method which creates *JobInfo object as
well as it’s referenced HCaTableInfo object.

Also *StorageDriver.initialize() is not passed the *JobInfo. I know it’s possible to deserialize
the object from Context object but wouldn’t it be cleaner to just pass it?

Let me know what you guys think. Feel free to point out misinterpretations I have made this’ll
help us understand better how things work together.

-Francis



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message