incubator-opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baldridge <jasonbaldri...@gmail.com>
Subject Re: using Scala for opennlp.ml
Date Sat, 14 Jan 2012 05:11:00 GMT
It's a perfectly fair question to ask. My first response is that I've been
programming primarily in Scala for more than a year and I not only enjoy it
but find myself far more productive with it. I am actually highly reluctant
to write Java code given that I have Scala as an alternative. I used to use
Python for quick scripting and Java for larger applications, but now I can
happily use Scala for both. But it goes beyond that -- it provides plenty
of opportunities to program in a very different way than either Python or
Java support. Mainly that is using functional programming -- once you are
used to it, not being able to program functionally becomes painful, and, in
my experience, far less productive. It isn't a natural thing for many
people initially, but a very nice thing about Scala is that it actually
allows you to code in the imperative style you might be accustomed to while
gradually bringing functional aspects into your programs. And it goes well
beyond the nice little examples of how a various bit of Scala code is
shorter than a given bit of Java that does the same thing -- it leads to
different, and I would say, generally better design. (Though for what it's
worth, the significant reduction in boilerplate code over Java is truly
liberating.)

Other things to like about Scala are type inference, immutable data
structures, an amazing collections library, much better object orientation
than Java, and pattern matching (not regex, but switch statements on
steroids).  The fact that it compiles to Java byte code makes integration
with Java and use of Java APIs quite straightforward, which was a reason
for me to prefer it to other alternatives than Java. There's much more,
including many intangibles that come with experience. Here's an article
that conveys some of that, from the perspective of coming from Python:

http://www.artima.com/weblogs/viewpost.jsp?thread=328540

As for programmers, there is actually a very strong Scala contingent in the
NLP and machine learning world, including groups at UMass Amherst,
Stanford, and UT Austin, and probably elsewhere. Scala is also seeing
corporate adoption, though of course it has nothing like the numbers of
Java programmers. Most of my students are now using Scala, so having
opennlp.ml be in Scala will be convenient for work they could contribute to
the package.

There have been a lot of reasons in the past not to use Scala, especially
poor IDE support and problems with backward compatibility that made it
problematic for enterprise projects. That has changed a great deal in the
past year, especially with the efforts being made by Typesafe.

Happy to discuss more!

Jason

On Tue, Jan 10, 2012 at 8:53 PM, James Kosin <james.kosin@gmail.com> wrote:

> Everyone,
>
> +1
> I'm okay with going forward with this; but, I must ask Why?  I know
> Scala may be a good thing; but, if it generates Java byte code then
> isn't there an equivalent way to write the same things in Java?
>
> What sort of benefit will we get with the code migrated and written in
> Scala?  Even the author of that article said not many know the inner
> workings of the language....  He was one of the few.
>
> Maybe we could ask or have a poll taken to see how many know Scala in
> the community?
>
> Sorry for my concerns, or if they seem harsh or over-analytical.
>
> Just concerned,
> James
>
> On 1/10/2012 12:20 AM, Jason Baldridge wrote:
> > +1 to this in general, though I'm not into over-architecting things
> > initially. Would be great to get things humming and then start supporting
> > more pluggability.
> >
> > On Sat, Jan 7, 2012 at 7:33 AM, Jörn Kottmann <kottmann@gmail.com>
> wrote:
> >
> >> On 1/7/12 2:22 PM, Grant Ingersoll wrote:
> >>
> >>> Being able to take advantage of other classifiers seems like it would
> be
> >>> a really nice thing to be able to do.  I'd love to put OpenNLP over
> Mahout
> >>> or others.
> >>>
> >>> Besides, for testing purposes, if you could plugin the existing
> >>> capability versus your new rewrite (in Scala) then you could easily
> compare
> >>> the two.  I can't imagine the abstraction layer is more than a few
> >>> interfaces or abstract classes plus a bit of
> configuration/injection/fill
> >>> in the blank that allows one to specify the implementation.
> >>>
> >> Yes, we need plug-able classifiers and support for extensive
> >> modification/extension of
> >> our existing components. You are welcome to help us with that.
> >>
> >> One way of implementing this is to specify a (optional) factory class
> >> during training
> >> which is used to create a model (classifier). A second type of factory
> >> class could
> >> be specified to modify a component.
> >>
> >> These factory class names will be stored in our zip model package, and
> can
> >> then be used to instantiated the extensions which are necessary to run
> the
> >> component.
> >>
> >> The disadvantage of this approach is that it might not work well with
> OSGi.
> >> A big advantage is that OpenNLP itself will take care of configuring
> >> everything
> >> and the code needed to run an OpenNLP component is identical, even if
> the
> >> model
> >> uses "custom" extensions. These must only be on the class path.
> >>
> >> Jörn
> >>
> >
> >
>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message