hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Reading and Writing with Hive 0.13 from a Yarn application
Date Wed, 03 Sep 2014 16:16:47 GMT
Hi Nathan,

This was done in https://issues.apache.org/jira/browse/HIVE-6248 Reasoning
was to minimize api surface area to users so that they are immune of
incompatible changes in internal classes and thus making it easier for them
to consume this and not get worried about version upgrade. Seems like in
the process some of the functionality went away.
Which info you are looking for exactly? Is it String[] getBlockLocations()
equivalent of InputSplit? If so, we can consider adding that in
ReaderContext() since that one need not to expose any hadoop or hive
classes.

Thanks,
Ashutosh


On Tue, Sep 2, 2014 at 5:26 PM, Nathan Bamford <nathan.bamford@redpoint.net>
wrote:

>  Hi,
>
>   My company has been working on a Yarn application for a couple of
> years-- we essentially take the place of MapReduce and split our data and
> processing ourselves.
>
>   One of the things we've been working to support is Hive access, and the
> HCatalog interfaces and API seemed perfect. Using this information:
> <https://hive.apache.org/javadocs/hcat-r0.5.0/readerwriter.html>
> https://hive.apache.org/javadocs/hcat-r0.5.0/readerwriter.html and
> TestReaderWriter.java from the source code, I was able to create and use
> HCatSplits to allow balanced data local parallel reading (using the size
> and locations methods available from each HCatSplit).
>
>   Much to my dismay, 0.13 removes a lot of that functionality. The
> ReaderContext class is now an interface that only exposes numSplits,
> whereas all of the other methods are in the inaccessible (package
> only) ReaderContextImpl class.
>
>   Since I no longer have access to the actual HCatSplits from the
> ReaderContext, I am unable to process them and send them to our yarn app on
> the data local nodes.  My only choice seems to be to partition out the
> splits to slave nodes more or less at random.
>
>   Does anyone know if, as of 0.13, this is the intended way to interface
> with Hive via non-Hadoop yarn applications? Is the underlying HCatSplit
> only intended for internal use, now?
>
>
>  Thanks,
>
>
>  Nathan Bamford
>

Mime
View raw message