incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <>
Subject general questions about hcatalog usage
Date Sat, 12 Jan 2013 17:26:10 GMT
i would like to use hcatalog to allow programs (which are not written in
hive or pig but map-reduce and cascading) to have access to hive tables for
reading and writing while being as isolated from hive as possible. i
generally have little interest in hive and its metastore but this feature
is attractive to me since of course not everyone agrees with me so i do
often run into requests to read from tables in hive and to publish results
to tables in hive.

all the programs are written using the old/stable org.apache.hadoop.mapred
api. cascading also uses org.apache.hadoop.mapred api. so migrating to new
mapreduce api is not an option. there are some dependencies on libraries
which might conflict with what hives depends on. for example i remember
trying to integrate with hive directly myself but giving up after realizing
that my avro dependency was a different version from the avro classes that
were inside the hive-exec jar (still a mystery to me why one would package
classes like that inside hive-exec as opposed to a normal dependency where
dependency management can resolve version conflicts).

so my questions are:
1) is usage of hcatalog possible with org.apache.hadoop.mapred api?
2) is there a way to avoid the classpath issues with hive's dependencies?
does hcatalog bring them all in? including my old enemy hive-exec?

i read some messages on the boards indicating that both 1 & 2 could be a
problem, however i got the impression people managed to work around them. i
am not sure how. any advice would be appreciated. the idea of having a
single point of integration with hive such as hcatalog provides would be
really nice to have.


View raw message