hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Bamford <>
Subject Reading and Writing with Hive 0.13 from a Yarn application
Date Wed, 03 Sep 2014 00:26:12 GMT

  My company has been working on a Yarn application for a couple of years-- we essentially
take the place of MapReduce and split our data and processing ourselves.

  One of the things we've been working to support is Hive access, and the HCatalog interfaces
and API seemed perfect. Using this information: <> and from
the source code, I was able to create and use HCatSplits to allow balanced data local parallel
reading (using the size and locations methods available from each HCatSplit).

  Much to my dismay, 0.13 removes a lot of that functionality. The ReaderContext class is
now an interface that only exposes numSplits, whereas all of the other methods are in the
inaccessible (package only) ReaderContextImpl class.

  Since I no longer have access to the actual HCatSplits from the ReaderContext, I am unable
to process them and send them to our yarn app on the data local nodes.  My only choice seems
to be to partition out the splits to slave nodes more or less at random.

  Does anyone know if, as of 0.13, this is the intended way to interface with Hive via non-Hadoop
yarn applications? Is the underlying HCatSplit only intended for internal use, now?


Nathan Bamford

View raw message