hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lavelle, Shawn" <Shawn.Lave...@osii.com>
Subject RE: Hive External Storage Handlers
Date Tue, 19 Jul 2016 19:18:45 GMT
I am using the compiled version of spark-sql. But the API seems to have changed and the storage
handler is not receiving the pushdown predicate as it did on hive 0.11 on shark 0.9.2.  We’ve
only written our own storage handler.

Specifically, the FILTER_EXPR_CONF-like parameters are not being set in the jobconf.  Either
that changed, or there’s a bug and nobody noticed and we were the only people using ;) 
As mentioned, there very well could be an API change that I missed, or spark-sql doesn’t
populate that when running HIVECONTEXT.  I don’t know which, hopefully someone here does!

   Thanks,

~ Shawn M Lavelle

PS I had asked this question months ago, but I think the spam filter prevented me from seeing
anything about it. I got our IT department to white list the alias.  I thank you for your
patience if my issue was already discussed.

From: Jörn Franke [mailto:jornfranke@gmail.com]
Sent: Tuesday, July 19, 2016 10:58 AM
To: user@hive.apache.org
Subject: Re: Hive External Storage Handlers

The main reason is that if you compile it yourself then nobody can understand what you did.
Hence any distribution can be downloaded and people can follow what you did. As far as I recall
you had described several problems that the distributions did not have (eg you could not compile
tez, spark only in an outdated version etc). Furthermore the distributions have a clear baseline
for configuration of several complex pieces of software.

Hence even for production use a self-compiled version of something complex such as the Hadoop,
hive, spark toolkit is clearly a no go.

On 19 Jul 2016, at 08:25, Mich Talebzadeh <mich.talebzadeh@gmail.com<mailto:mich.talebzadeh@gmail.com>>
wrote:


"So not use a self-compiled hive or Spark version, but only the ones supplied by distributions
(cloudera, Hortonworks, Bigtop...) You will face performance problems, strange errors etc
when building and testing your code using self-compiled versions."

This comment does not make sense and is meaningless without any evidence. Either you provide
evidence that you have done this work and you encountered errors or better not mention it.
Sounds like scaremongering.









Dr Mich Talebzadeh



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content
is explicitly disclaimed. The author will in no case be liable for any monetary damages arising
from such loss, damage or destruction.



On 19 July 2016 at 06:51, Jörn Franke <jornfranke@gmail.com<mailto:jornfranke@gmail.com>>
wrote:
So not use a self-compiled hive or Spark version, but only the ones supplied by distributions
(cloudera, Hortonworks, Bigtop...) You will face performance problems, strange errors etc
when building and testing your code using self-compiled versions.

If you use the Hive APIs then the engine should not be relevant for your storage handler.
Nevertheless, the APIs of the storage handler might have changed.

However, I wonder why a 1-1 mapping does not work for you.

On 18 Jul 2016, at 22:46, Mich Talebzadeh <mich.talebzadeh@gmail.com<mailto:mich.talebzadeh@gmail.com>>
wrote:
Hi,

You can move up to Hive 2 that works fine and pretty stable. You can opt for Hive 1.2.1 if
yoy wish.

If you want to use Spark (the replacement for Shark) as the execution engine for Hive then
the version that works (that I have managed to make it work with Hive is Spark 1.3.1) that
you will need to build from source.

It works and it is table.

Otherwise you may decide to use Spark Thrift Server (STS) that allows JDBC access to Spark
SQL (through beeline, Squirrel , Zeppelin) that has Hive SQL context built into it as if you
were using Hive Thrift Server (HSS)

HTH



Dr Mich Talebzadeh



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content
is explicitly disclaimed. The author will in no case be liable for any monetary damages arising
from such loss, damage or destruction.



On 18 July 2016 at 21:38, Lavelle, Shawn <Shawn.Lavelle@osii.com<mailto:Shawn.Lavelle@osii.com>>
wrote:
Hello,

    I am working with an external storage handler written for Hive 0.11 and run on a Shark
execution engine.  I’d like to move forward and upgrade to hive 1.2.1 on spark 1.6 or even
2.0.

   This storage has a need to run queries across tables existing in different databases in
the external data store, so existing drivers that map hive to external storage in 1 to 1 mappings
are insufficient. I have attempted this upgrade already, but found out that predicate pushdown
was not occurring.  Was this changed in 1.2?

   Can I update and use the same storage handler in Hive or has this concept been replaced
by the RDDs and DataFrame API?

   Are these questions better for the Spark list?

   Thank you,

~ Shawn M Lavelle



<image2a7f96.GIF>
Shawn Lavelle
Software Development

4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559<tel:763%20551%200559>
Fax: 763 551 0750<tel:763%20551%200750>
Email: Shawn.Lavelle@osii.com<mailto:Shawn.Lavelle@osii.com>
Website: www.osii.com<http://www.osii.com>




Mime
View raw message