spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Brown <...@mult.ifario.us>
Subject Re: proposal: replace lift-json with spray-json
Date Wed, 12 Feb 2014 19:47:09 GMT
Yup; you're right:

https://github.com/json4s/json4s/blob/3.2.6_2.10/project/Dependencies.scala

The older deps are only in use in examples/benchmarking.  All good.


—
prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Wed, Feb 12, 2014 at 11:35 AM, Aaron Davidson <ilikerps@gmail.com> wrote:

> The version of json4s we're using (3.2.6 in the 2.10 branch) does seem to
> depend on Jackson 2.3.0 and Scala 2.10.0:
> http://mvnrepository.com/artifact/org.json4s/json4s-jackson_2.10/3.2.6
>
>
> On Wed, Feb 12, 2014 at 11:29 AM, Paul Brown <prb@mult.ifario.us> wrote:
>
> > Hi, Aaron --
> >
> > I can't speak to issues relevant to Spark, but it looks like json4s is
> > currently using the Jackson Scala module 2.1.3 and Scala 2.9.2.  There
> have
> > been quite a few significant changes to the Scala module and
> underpinnings
> > between the 2.1.x and 2.3.x series, but I can't speak to how that
> interacts
> > with json4s.  Many of those changes are convenience for direct usage of
> the
> > Jackson Scala module in binding case classes transparently, but you
> > wouldn't need or benefit from those through the json4s API.  (FWIW, we
> use
> > Jackson Scala 2.3.2 in our Spark jobs to bind lines of JSON from text
> files
> > to case classes.)
> >
> > I'll reach out to json4s and see if I can get them to update to the 2.3.x
> > Jackson series and Scala 2.10, but I think it makes sense to for Spark to
> > just use the released version and then update when a json4s release is
> > available.
> >
> > Best.
> > -- Paul
> >
> > --
> > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >
> >
> > On Wed, Feb 12, 2014 at 10:38 AM, Aaron Davidson <ilikerps@gmail.com>
> > wrote:
> >
> > > Will, thanks for the clarifications. I think Spark's main use-case is
> > > "warm, small inputs" right now, but the change seems reasonable to me
> > > nevertheless.
> > >
> > > Paul, do you know if there are any issues relevant to Spark that we
> need
> > > from 2.3.2? We would also have to wait for json4s to release a new
> > version
> > > that depends on 2.3.2, or else pull it in ourselves.
> > >
> > >
> > > On Wed, Feb 12, 2014 at 9:47 AM, Paul Brown <prb@mult.ifario.us>
> wrote:
> > >
> > > > And, with my FasterXML hat on, if you ask, you'll find the Jackson
> > folks
> > > > will turn around issues quickly.  FWIW, there is a full-suite Jackson
> > > 2.3.2
> > > > release rolling right up if you wait a couple of days to pull that
> in.
> > > >
> > > > -- Paul
> > > >
> > > > --
> > > > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> > > >
> > > >
> > > > On Wed, Feb 12, 2014 at 8:12 AM, Will Benton <willb@redhat.com>
> wrote:
> > > >
> > > > > ----- Original Message -----
> > > > >
> > > > > > I am not sure I fully understand this reasoning. I imagine that
> > > > lift-json
> > > > > > is only one of hundreds of packages that would have to be built
> if
> > > you
> > > > > > wanted to build all of Spark's transitive dependencies from
> source.
> > > > >
> > > > > This is absolutely true.  However, many of Spark's dependencies are
> > > > > already available in operating system distributions.  In fact, in
> the
> > > > case
> > > > > I am most familiar with (packaging Spark for Fedora), Lift is the
> > > biggest
> > > > > one left that isn't already available or under review.
> > > > >
> > > > > > Additionally, to make sure I understand the impact -- this is
> only
> > > > > intended
> > > > > > to simplify the process of packaging Spark on a new OS
> distribution
> > > > that
> > > > > > disallows pulling in binaries?
> > > > >
> > > > > Yes, this was my main motivation.  Since the process of building
> Lift
> > > and
> > > > > its transitive dependencies is disproportionately complex compared
> to
> > > how
> > > > > much Spark uses lift-json, I thought it would be nice to replace
it
> > > with
> > > > > something that could be built as just a JSON library.  I would
> argue
> > > that
> > > > > -- all else being equal -- it generally makes sense to make
> software
> > > > > development choices that facilitate packaging for distributions
> like
> > > > Fedora
> > > > > and Debian.
> > > > >
> > > > > There are other actual and potential advantages, though; here are
a
> > > few:
> > > > >
> > > > > 1.  Based on some simple timing runs I did, json4s-jackson is
> faster
> > > all
> > > > > around when running warm (i.e. on subsequent timing runs in the
> same
> > VM
> > > > or
> > > > > timing runs with enough iterations to last for more than a few
> > > seconds),
> > > > > slightly slower when running cold on very small parsing tasks, and
> > > > > significantly (~10x) faster on large parsing tasks whether cold or
> > > warm.
> > > > >  The knee in the cold lift-json performance curve is somewhere
> > between
> > > > 2kb
> > > > > and 50kb of JSON source text.  json4s-jackson is nominally faster
> > cold
> > > > with
> > > > > a 12kb file, 40% faster with a 50kb file, 2.6x faster with a 500kb
> > file
> > > > and
> > > > > 10x faster with files ranging from 4-20mb.  Given how Spark uses
> JSON
> > > at
> > > > > the moment, the improved large-file parsing performance seems
> > unlikely
> > > to
> > > > > be a huge practical advantage for json4s-jackson, but it's worth
> > > noting.
> > > > > 2.  The release schedule of json4s isn't coupled to the release
> > > schedule
> > > > > of a larger project.
> > > > > 3.  json4s is intended to provide a uniform interface to Scala JSON
> > > > > libraries, and it provides multiple backends, which offers
> potential
> > > > > flexibility in the future.  (To be fair, this interface is heavily
> > > based
> > > > on
> > > > > the one provided by Lift, so it would be only slightly more work
to
> > go
> > > > from
> > > > > lift-json to json4s, as my patch does, as it would be to switch
> > between
> > > > > json4s backends.)
> > > > >
> > > > > Again, this change is primarily motivated by a desire to make life
> > > easier
> > > > > for downstream packagers, but there is no obvious downside (beyond
> > the
> > > > > downsides inherent in changing library dependencies) and several
> > minor
> > > > > advantages.
> > > > >
> > > > >
> > > > > best,
> > > > > wb
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message