hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@apache.org>
Subject Re: [DISCUSS] status of and plans for our hbase-spark integration
Date Thu, 22 Jun 2017 15:50:06 GMT
On Thu, Jun 22, 2017 at 10:00 AM, Mike Drob <mdrob@apache.org> wrote:
> That's a lot of ground you're trying to cover, Sean, thanks for putting
> this together.
>> 1) Branch-1 releases
>> Is there anything else we ought to be tracking here?
> We currently have code in the o.a.spark namespace. I don't think there is a
> JIRA for it yet, but this seems like cross-project trouble waiting to
> happen. https://github.com/apache/hbase/tree/master/
> hbase-spark/src/main/scala/org/apache/spark

IIRC, this was something we had to do because of how Spark architected
their stuff. So long as we're marking all of that stuff IA.Private I
think we're good, since we can fix it later if/when Spark changes.

>> The way I see it, the options are a) ship both 1.6 and 2.y support, b)
>> ship just 2.y support, c) ship 1.6 in branch-1 and ship 2.y in
>> branch-2. Does anyone have preferences here?
> I think I prefer option B here as well. It sounds like Spark 2.2 will be
> out Very Soon, so we should almost certainly have a story for that. If
> there are no compatibility issues, then we can support >= 2.0 or 2.1,
> otherwise there's no reason to try and hit the moving target and we can
> focus on supporting the newest release. Like you said earlier, there's been
> no official release of this module yet, so I have to imagine that the
> current consumers are knowingly bleeding edge and can handle an upgrade or
> recompile on their own.

Yeah, the bleeding-edge bit sounds fair. (someone please shout if it ain't)

>> 4) Packaging all this probably will be a pain no matter what we do
> Do we have to package this in our assembly at all? Currently, we include
> the hbase-spark module in the branch-2 and master assembly, but I'm not
> convinced this needs to be the case. Is it too much to ask users to build a
> jar with dependencies (which I think we already do) and include the
> appropriate spark/scala/hbase jars in it (pulled from maven)? I think this
> problem can be better solved through docs and client tooling rather than
> going through awkward gymnastics to package m*n versions in our tarball
> _and_ making sure that we get all the classpaths right.

Even if we don't put it in the assembly, we still have to package m*n
versions to put up in Maven, right?

I'm not sure on the jar-with-deps bit. It's super nice to just include
one known-deployed jar in your spark classpath instead of putting that
size into each application jar your run. Of course, to your point on
classpaths, right now they'd need to grab things besides that jar.
Maybe these should be shaded jars sooner rather than later?

View raw message