Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EE159200CB2 for ; Sun, 25 Jun 2017 23:50:44 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id ED670160BE0; Sun, 25 Jun 2017 21:50:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 18729160BCA for ; Sun, 25 Jun 2017 23:50:43 +0200 (CEST) Received: (qmail 67469 invoked by uid 500); 25 Jun 2017 21:50:43 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 67457 invoked by uid 99); 25 Jun 2017 21:50:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Jun 2017 21:50:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id D41CEC05B0 for ; Sun, 25 Jun 2017 21:50:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.397 X-Spam-Level: X-Spam-Status: No, score=-2.397 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.796, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id jSgFKC0IUzb0 for ; Sun, 25 Jun 2017 21:50:38 +0000 (UTC) Received: from mail-wr0-f181.google.com (mail-wr0-f181.google.com [209.85.128.181]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id B167A5F6C4 for ; Sun, 25 Jun 2017 21:50:37 +0000 (UTC) Received: by mail-wr0-f181.google.com with SMTP id r103so129454770wrb.0 for ; Sun, 25 Jun 2017 14:50:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=XcbusOZft+p03eiIawvR6jJppn/4usMOAlnWE/oV/Dk=; b=PsRUrgy1jSG7QzFhcoBx1PE5aPQrqfPFqZH/aoJ55qAHNHTI7dHP9e4AjlDu2V8lKY oivxb9HX8zaKWL0DL+IiwQdLC84h2OaUP3m0D+KDyC5oOSAItvVTtHiL6zz93GycWc7H 0orl/Aai1/k6l1nxEgvvv9SkxkMEwbMqsl83/6B+xEnwjHmdB+ELRaI1po2kzp/iCsKY qtLPQ2Cq3bofsyWyFhPjej53IwYwibLkVcqfehoAQKuoQ9hpws2db62KZhmxCcfxukgQ tBrxDX16JkgqeqC3fXJoqWXE5Ch8jnPcSyallnf1mzaw41xTLf/07yT7icwCmM20llXK JjFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=XcbusOZft+p03eiIawvR6jJppn/4usMOAlnWE/oV/Dk=; b=faUakDSrdNaCsvh7SU6RYgRufHqyUuv/+xMws8fWnZ09WkmxU/h9lCJtBKmUHn6gDF 9iXb9kS/KxUeMW6vDXQ+ZNkIYulE4/snlQx8w0mA9hD1tAhKINbje9Ng4rFeIbMzMRug YlP/riuRV6V4IuFUOtlE5sxug0xP98MvGyH1qm12Upyk0D8fkUll7Tu6Kg18HMtnJ1Y6 gdx8FEgXpYE86yjVCwqtiDAd3ActVp7UAtjRzjliH1zD8K3eADEoBQ3QvB5yDta2aXAx 8lTTJqfybVVOLxUCIPu/RISsE6ppW7K/4ZGgzh0pumkLjMafaaA1P/VVz33RZdWcEt5z B+6w== X-Gm-Message-State: AKS2vOyAa6eHqg39Pt9av0jSXmQfNJ0TwLQhcjD+b/fsKLuznPV/12Z4 MEd24gevU4s3XEcr1um2YQMfCdIa+PsC X-Received: by 10.223.134.157 with SMTP id 29mr12857622wrx.157.1498427436431; Sun, 25 Jun 2017 14:50:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.143.21 with HTTP; Sun, 25 Jun 2017 14:50:15 -0700 (PDT) In-Reply-To: References: From: Jerry He Date: Sun, 25 Jun 2017 14:50:15 -0700 Message-ID: Subject: Re: [DISCUSS] status of and plans for our hbase-spark integration To: dev Content-Type: text/plain; charset="UTF-8" archived-at: Sun, 25 Jun 2017 21:50:45 -0000 >> We currently have code in the o.a.spark namespace. I don't think there is a >> JIRA for it yet, but this seems like cross-project trouble waiting to >> happen. https://github.com/apache/hbase/tree/master/ >> hbase-spark/src/main/scala/org/apache/spark >> > > IIRC, this was something we had to do because of how Spark architected > their stuff. So long as we're marking all of that stuff IA.Private I > think we're good, since we can fix it later if/when Spark changes. > Yes. IIRC The trick is needed because we use a construct from spark sql package private for Spark 1.6. This trick is no longer needed if we only support Spark 2.x. > >> The way I see it, the options are a) ship both 1.6 and 2.y support, b) > >> ship just 2.y support, c) ship 1.6 in branch-1 and ship 2.y in > >> branch-2. Does anyone have preferences here? > > > > I think I prefer option B here as well. It sounds like Spark 2.2 will be > > out Very Soon, so we should almost certainly have a story for that. If > > there are no compatibility issues, then we can support >= 2.0 or 2.1, > > otherwise there's no reason to try and hit the moving target and we can > > focus on supporting the newest release. Like you said earlier, there's been > > no official release of this module yet, so I have to imagine that the > > current consumers are knowingly bleeding edge and can handle an upgrade or > > recompile on their own. > > > > Yeah, the bleeding-edge bit sounds fair. (someone please shout if it ain't) I am for Option b) as well! Even better, I am for we only ship support for Scala 2.11. Start clean? >>> 4) Packaging all this probably will be a pain no matter what we do >> >> Do we have to package this in our assembly at all? Currently, we include >> the hbase-spark module in the branch-2 and master assembly, but I'm not >> convinced this needs to be the case. Is it too much to ask users to build a >> jar with dependencies (which I think we already do) and include the >> appropriate spark/scala/hbase jars in it (pulled from maven)? I think this >> problem can be better solved through docs and client tooling rather than >> going through awkward gymnastics to package m*n versions in our tarball >> _and_ making sure that we get all the classpaths right. >> > > > Even if we don't put it in the assembly, we still have to package m*n > versions to put up in Maven, right? > > I'm not sure on the jar-with-deps bit. It's super nice to just include > one known-deployed jar in your spark classpath instead of putting that > size into each application jar your run. Of course, to your point on > classpaths, right now they'd need to grab things besides that jar. > Maybe these should be shaded jars sooner rather than later? There is a Filter class from the hbase-spark module that needs to be on the server classpath. If we don't have the whole jar there, we have to do some trick to separate it out. Great write-up from Sean. Thanks, Jerry