Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D7106200C49 for ; Fri, 17 Mar 2017 21:59:30 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id D5C03160B80; Fri, 17 Mar 2017 20:59:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AFC74160B70 for ; Fri, 17 Mar 2017 21:59:29 +0100 (CET) Received: (qmail 66281 invoked by uid 500); 17 Mar 2017 20:59:28 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 66271 invoked by uid 99); 17 Mar 2017 20:59:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Mar 2017 20:59:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 38068C255F for ; Fri, 17 Mar 2017 20:59:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 0SDEm6FMgEvS for ; Fri, 17 Mar 2017 20:59:24 +0000 (UTC) Received: from mail-oi0-f42.google.com (mail-oi0-f42.google.com [209.85.218.42]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 98D465F46F for ; Fri, 17 Mar 2017 20:59:24 +0000 (UTC) Received: by mail-oi0-f42.google.com with SMTP id q19so1092426oic.0 for ; Fri, 17 Mar 2017 13:59:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=pisk9V2LKfW84LOynpX4UVkodRYOtcjw/LiHgrx2G+w=; b=Lo7KZF9axyXzxozJ1Y2GBexQoK6nnF/0VNfrCETK10XqtwZmr1PYOLSdUnHjpppVwE q1RDcl5tAI7CvyPbJ2rR02Kv4C6T2nIGvv4tAYPxGj8pCPTQlhYd6AiGVEX4+SnRWaA3 mUVzcnJ4Tq+yZ6xFZT7Zi0/h8GHmd34qjKXZPZY0jBDrRMtezFrDc60TyME2jS4prHsf sNmIRqCUepTW9VguWUTy9ZBYjOeZTGh98540HSWB+6XXhfYzAQIQ2r81sfX3Zm9a7WtC IB8/qU9uBXZ9yrvvkUAFtxVpf4HZUdlhME5aPNnAdtGALkke4jQdnkwdGHYTexSFAsTf Lx+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=pisk9V2LKfW84LOynpX4UVkodRYOtcjw/LiHgrx2G+w=; b=Q+M2MS4x7qAghtJELyuX1bc4KgBhqLK++o7hxySdmMLdeXHyHrWxceIrVkJaQrWaVR FwIPCOt1i1DwdxxcdevW946+gI2UwjOvP9dKaiHC20AoV0SlB3CynvkxWGh75wPnGmPQ pySgdGylJKkXuIS/dIUXhbXbY2bqjYUkgRERQZ2z9pRMGnkngt1cCAS+5MPiOpYlTnGe fZmaSlfFsZKwSR7aa5ggD0sY0HvwQ9X09NEfnwrA2qernDaxmS5AUAijq54tGzBrV4NW ZTXrpwDummvfcu9x5o6KG61cB9JboZhJSLLllm/dJxpnStf5HnhBG6W8VKYp85eSli6P AAWg== X-Gm-Message-State: AFeK/H37kFkNQlC5uzdbFDOmdSwJRyRNn/pw0FSlgemhsSdGsmMKwOmc9crEVtJBTpdhYIawpO1FoayCzjvE4g== X-Received: by 10.202.108.200 with SMTP id h191mr7727635oic.114.1489784357889; Fri, 17 Mar 2017 13:59:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.46.84 with HTTP; Fri, 17 Mar 2017 13:58:57 -0700 (PDT) In-Reply-To: References: <1946462485.2370862.1489776985777@mail.yahoo.com> From: Stephen Sprague Date: Fri, 17 Mar 2017 13:58:57 -0700 Message-ID: Subject: Re: hive on spark - version question To: "user@hive.apache.org" Cc: hernan saab Content-Type: multipart/alternative; boundary=001a1142e6ea71b74c054af37328 archived-at: Fri, 17 Mar 2017 20:59:31 -0000 --001a1142e6ea71b74c054af37328 Content-Type: text/plain; charset=UTF-8 thanks for the comments and for sure all relevant. And yeah I feel the pain just like the next guy but that's the part of the opensource "life style" you subscribe to when using it. The upside payoff has gotta be worth the downside risk - or else forget about it right? Here in the Hive world in my experience anyway its been great. Gotta roll with it, be courteous, be persistent and sometimes things just work out. Getting back to Spark and Tez yes by all means i'm a big Tez user aleady so i was hoping to see what Spark brought to table and i didn't want to diddle around with Spark < 2.0. That's cool. I can live with that not being nailed down yet. I'll just wait for hive 2.2 and rattle the cage again! ha! All good! Cheers, Stephen. On Fri, Mar 17, 2017 at 1:14 PM, Edward Capriolo wrote: > > > On Fri, Mar 17, 2017 at 2:56 PM, hernan saab > wrote: > >> I have been in a similar world of pain. Basically, I tried to use an >> external Hive to have user access controls with a spark engine. >> At the end, I realized that it was a better idea to use apache tez >> instead of a spark engine for my particular case. >> >> But the journey is what I want to share with you. >> The big data apache tools and libraries such as Hive, Tez, Spark, Hadoop >> , Parquet etc etc are not interchangeable as we would like to think. There >> are very limited combinations for very specific versions. This is why tools >> like Ambari can be useful. Ambari sets a path of combos of versions known >> to work and the dirty work is done under the UI. >> >> More often than not, when you try a version that few people tried, you >> will get error messages that will derailed you and cause you to waste a lot >> of time. >> >> In addition, this group, as well as many other apache big data user >> groups, provides extremely poor support for users. The answers you usually >> get are not even hints to a solution. Their answers usually translate to >> "there is nothing I am willing to do about your problem. If I did, I should >> get paid" in many cryptic ways. >> >> If you ask your question to the Spark group they will take you to the >> Hive group and viceversa (I can almost guarantee it based on previous >> experiences) >> >> But in hindsight, people who work on this kinds of things typically make >> more money that the average developers. If you make more $$s it makes sense >> learning this stuff is supposed to be harder. >> >> Conclusion, don't try it. Or try using Tez/Hive instead of Spark/Hive if >> you are querying large files. >> >> >> >> On Friday, March 17, 2017 11:33 AM, Stephen Sprague >> wrote: >> >> >> :( gettin' no love on this one. any SME's know if Spark 2.1.0 will >> work with Hive 2.1.0 ? That JavaSparkListener class looks like a deal >> breaker to me, alas. >> >> thanks in advance. >> >> Cheers, >> Stephen. >> >> On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague >> wrote: >> >> hi guys, >> wondering where we stand with Hive On Spark these days? >> >> i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental >> versions) and running up against this class not found: >> >> java.lang. NoClassDefFoundError: org/apache/spark/ JavaSparkListener >> >> >> searching the Cyber i find this: >> 1. http://stackoverflow.com/ questions/41953688/setting- >> spark-as-default-execution- engine-for-hive >> >> >> which pretty much describes my situation too and it references this: >> >> >> 2. https://issues.apache.org/ jira/browse/SPARK-17563 >> >> >> which indicates a "won't fix" - but does reference this: >> >> >> 3. https://issues.apache.org/ jira/browse/HIVE-14029 >> >> >> which looks to be fixed in hive 2.2 - which is not released yet. >> >> >> so if i want to use spark 2.1.0 with hive am i out of luck - until hive >> 2.2? >> >> thanks, >> Stephen. >> >> >> >> >> > Stephan, > > I understand some of your frustration. Remember that many in open source > are volunteering their time. This is why if you pay a vendor for support of > some software you might pay 50K a year or $200.00 an hour. If I was your > vendor/consultant I would have started the clock 10 minutes ago just to > answer this email :). The only "pay" I ever got from Hive is that I can use > it as a resume bullet point, and I wrote a book which pays me royalties. > > As it relates specifically to your problem, when you see the trends you > are seeing it probably means you are in a minority of the user base. Either > your doing something no one else is doing, you are too cutting edge, or no > one has an easy solution. Hive is making the move from the classic > MapReduce, two other execution engines have been made Tez and HiveOnSpark. > Because we are open source we allow people to "scratch an itch" that is the > Apache way. From time to time in means something that was added stops being > viable because of lack of support. > > I agree with your final assessment which is Tez is the most viable engine > for Hive. This is by no means a put down of the HiveOnSpark work and it > does not mean it will never the most viable. By the same token if the > versions fall out of sync and all that exists is complains the viability > speaks for itself. > > Remember that keeping two fast moving things together is no easy chore. I > used to run Hive + cassandra. Seems easy, crap two versions of common CLI, > shade one version everything works, crap new hive release has different > versions of thrift, shade + patch, crap now one of the other dependencies > is incompatible fork + shade + patch. At some point you have to say to > yourself if I can not make critical mass of this solution such that I am > the only one doing/patching it then I give up and find some other way to do > it. > --001a1142e6ea71b74c054af37328 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
thanks for the comments and for sure all relevant. And yeah = I feel the pain just like the next guy but that's the part of the opens= ource "life style" you subscribe to when using it.=C2=A0

= The upside payoff has gotta be worth the downside risk - or else forget abo= ut it right? Here in the Hive world in my experience anyway its been great.= =C2=A0 Gotta roll with it, be courteous, be persistent and sometimes things= just work out.

Getting back to Spark and Tez yes by all means i= 'm a big Tez user aleady so i was hoping to see what Spark brought to t= able and i didn't want to diddle around with Spark < 2.0.=C2=A0=C2= =A0 That's cool. I can live with that not being nailed down yet. I'= ll just wait for hive 2.2 and rattle the cage again! ha!


<= div class=3D"gmail_default" style=3D"font-family:courier new,monospace">All= good!

Cheers,
Stephen.

On Fri, Mar 17, 2017 at 1:14 PM, E= dward Capriolo <edlinuxguru@gmail.com> wrote:


On Fri, Mar 17, 2017 at = 2:56 PM, hernan saab <hernan_javier_saab@yahoo.com> wrote:
I have = been in a similar world of pain. Basically, I tried to use an external Hive= to have user access controls with a spark engine.
At the end, I realized tha= t it was a better idea to use apache tez instead of a spark engine for my p= articular case.

<= /span>
Bu= t the journey is what I want to share with you.
The big data apache tools and libraries such as Hive, = Tez, Spark, Hadoop , Parquet etc etc are not interchangeable as we would li= ke to think. There are very limited combinations for very specific versions= . This is why tools like Ambari can be useful. Ambari sets a path of combos= of versions known to work and the dirty work is done under the UI.=C2=A0

More often than not, when you try a version that few people trie= d, you will get error messages that will derailed you and cause you to wast= e a lot of time.

In addition, this group, as well as many other= apache big data user groups, =C2=A0provides extremely poor support for use= rs. The answers you usually get are not even hints to a solution. Their ans= wers usually translate to "there is nothing I am willing to do about y= our problem. If I did, I should get paid" in many cryptic ways.
<= div id=3D"m_-5702108696014383964gmail-m_-7569578648684924698yui_3_16_0_ym19= _1_1489776198007_4007" dir=3D"ltr">
If you ask your question to the Spark group they will take you to the= Hive group and viceversa (I can almost guarantee it based on previous expe= riences)

But in hindsight, people who work on this kinds of thi= ngs typically make more money that the average developers. If you make more= $$s it makes sense learning this stuff is supposed to be harder.

Conclusion, don't try it. Or try using Tez/Hive i= nstead of Spark/Hive =C2=A0if you are querying large files.



On Friday, March 17, 2017 11:33 AM, S= tephen Sprague <= spragues@gmail.com> wrote:


<= div dir=3D"ltr">
:(=C2=A0 gettin' no love on this one.=C2=A0=C2=A0 any SME&#= 39;s know if Spark 2.1.0 will work with Hive 2.1.0 ?=C2=A0 That JavaSparkLi= stener class looks like a deal breaker to me, alas.

thanks in advance.

C= heers,
Stephen.

On Mon, Mar 13= , 2017 at 10:32 PM, Stephen Sprague <spra= gues@gmail.com> wrote:
hi= guys,
wondering where we stand with Hive On Spark thes= e days?

i'm trying to run Spar= k 2.1.0 with Hive 2.1.0 (purely coincidental versions) and running up again= st this class not found:

java.lang. No= ClassDefFoundError: org/apache/spark/ JavaSparkListener
<= br clear=3D"none">
=C2=A0=C2=A0=C2=A0 which pretty much des= cribes my situation too and it references this:


=C2=A0=C2=A0=C2=A0 which indicates a &q= uot;won't fix" - but does reference this:
=
=C2=A0=C2=A0=C2=A0 which looks to be fix= ed in hive 2.2 - which is not released yet.


so if i want to use spark 2.1.0 with hive am i o= ut of luck - until hive 2.2?

thanks,=
Stephen.

=




Stephan, =C2=A0

I understand some of your frustratio= n.=C2=A0 Remember that many in open source are volunteering their time. Thi= s is why if you pay a vendor for support of some software you might pay 50K= a year or $200.00 an hour. If I was your vendor/consultant I would have st= arted the clock 10 minutes ago just to answer this email :). The only "= ;pay" I ever got from Hive is that I can use it as a resume bullet poi= nt, and I wrote a book which pays me royalties.

As it relates specifically to you= r problem, when you see the trends you are seeing it probably means you are= in a minority of the user base. Either your doing something no one else is= doing, you are too cutting edge, or no one has an easy solution. Hive is m= aking the move from the classic MapReduce, two other execution engines have= been made Tez and HiveOnSpark. Because we are open source we allow people = to "scratch an itch" that is the Apache way. From time to time in= means something that was added stops being viable because of lack of suppo= rt.

I = agree with your final assessment which is Tez is the most viable engine for= Hive. This is by no means a put down of the HiveOnSpark work and it does n= ot mean it will never the most viable. By the same token if the versions fa= ll out of sync and all that exists is complains the viability speaks for it= self.=C2=A0

Remember that keeping two fast moving things together is no easy chor= e. I used to run Hive + cassandra. Seems easy, crap two versions of common = CLI, shade one version everything works, crap new hive release has differen= t versions of thrift, shade + patch, crap now one of the other dependencies= is incompatible fork + shade + patch. At some point you have to say to you= rself if I can not make critical mass of this solution such that I am the o= nly one doing/patching it then I give up and find some other way to do it.<= br>

--001a1142e6ea71b74c054af37328--