hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry He <jerry...@gmail.com>
Subject Re: [DISCUSS] status of and plans for our hbase-spark integration
Date Sun, 25 Jun 2017 21:50:15 GMT
>> We currently have code in the o.a.spark namespace. I don't think there is a
>> JIRA for it yet, but this seems like cross-project trouble waiting to
>> happen. https://github.com/apache/hbase/tree/master/
>> hbase-spark/src/main/scala/org/apache/spark
> IIRC, this was something we had to do because of how Spark architected
> their stuff. So long as we're marking all of that stuff IA.Private I
> think we're good, since we can fix it later if/when Spark changes.

Yes.  IIRC The trick is needed because we use a construct from spark
sql package private for Spark 1.6.
This trick is no longer needed if we only support Spark 2.x.

> >> The way I see it, the options are a) ship both 1.6 and 2.y support, b)
> >> ship just 2.y support, c) ship 1.6 in branch-1 and ship 2.y in
> >> branch-2. Does anyone have preferences here?
> >
> > I think I prefer option B here as well. It sounds like Spark 2.2 will be
> > out Very Soon, so we should almost certainly have a story for that. If
> > there are no compatibility issues, then we can support >= 2.0 or 2.1,
> > otherwise there's no reason to try and hit the moving target and we can
> > focus on supporting the newest release. Like you said earlier, there's been
> > no official release of this module yet, so I have to imagine that the
> > current consumers are knowingly bleeding edge and can handle an upgrade or
> > recompile on their own.
> >
> Yeah, the bleeding-edge bit sounds fair. (someone please shout if it ain't)

I am for Option b) as well!
Even better, I am for  we only ship support for Scala 2.11.   Start clean?

>>> 4) Packaging all this probably will be a pain no matter what we do
>> Do we have to package this in our assembly at all? Currently, we include
>> the hbase-spark module in the branch-2 and master assembly, but I'm not
>> convinced this needs to be the case. Is it too much to ask users to build a
>> jar with dependencies (which I think we already do) and include the
>> appropriate spark/scala/hbase jars in it (pulled from maven)? I think this
>> problem can be better solved through docs and client tooling rather than
>> going through awkward gymnastics to package m*n versions in our tarball
>> _and_ making sure that we get all the classpaths right.
> Even if we don't put it in the assembly, we still have to package m*n
> versions to put up in Maven, right?
> I'm not sure on the jar-with-deps bit. It's super nice to just include
> one known-deployed jar in your spark classpath instead of putting that
> size into each application jar your run. Of course, to your point on
> classpaths, right now they'd need to grab things besides that jar.
> Maybe these should be shaded jars sooner rather than later?

There is a Filter class from the hbase-spark module that needs to be
on the server classpath.
If we don't have the whole jar there, we have to do some trick to
separate it out.

Great write-up from Sean.



View raw message