opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baldridge <jasonbaldri...@gmail.com>
Subject Re: question for y'all regarding Scala in OpenNLP and transition to opennlp.ml
Date Thu, 14 Apr 2011 03:06:07 GMT
I also wrote a lot of that code back in 1999/2000, and have, ahem, learned a
lot since then. :)

Jason

On Tue, Apr 12, 2011 at 11:41 PM, James Kosin <james.kosin@gmail.com> wrote:

> Jason,
>
> Don't worry, hindsight is always 20/20.  But, it takes very good
> planning and a lot of time to do it right the first time.
>
> It always gets better the more you work it.
>
> James
>
> On 4/11/2011 11:13 PM, Jason Baldridge wrote:
> > Thanks everyone for your thoughts. I think the first step is to refactor
> the
> > package sticking with Java and then we'll see about moving to a
> Scala/Java
> > mix after that (but only for the opennlp machine learning package,
> currently
> > opennlp-maxent).
> >
> > I was actually sort of appalled looking through the code yesterday and
> > seeing so many global variables used all over the place, making it hard
> to
> > know exactly what every method had access to. I think this was sort of an
> > artifact of how I used Trove functions a loooong time ago to enable quick
> > iteration over the data structures (which required some objects to be
> > global). That is obviously gone now, but the global variables didn't go
> > away... hope I'll find time to improve things over the next 5-6 months.
> >
> > Jason
> >
> > On Mon, Apr 11, 2011 at 7:27 AM, Tommaso Teofili
> > <tommaso.teofili@gmail.com>wrote:
> >
> >> Hi Jason,
> >> I personally have some Scala experience while working with Clerezza [1]
> >> which uses both Java and Scala but what I think is that, while Scala is
> >> perfectly ok with existing Java standards and allowing
> functional/dynamic
> >> programming, it raises the barrier for new users/devs a little bit.
> >> So I am not so sure that a Scala implementation should totally replace
> an
> >> existing one, maybe a graceful introduction would be more welcome.
> >> My 2 cents,
> >> Tommaso
> >>
> >>
> >> [1] : http://incubator.apache.org/clerezza
> >>
> >> 2011/4/10 Jason Baldridge <jasonbaldridge@gmail.com>
> >>
> >>> It's been a while since I posted these request for input... Does anyone
> >>> have
> >>> any thoughts on it? Is anyone else interested in Scala being part of
> >>> OpenNLP?
> >>>
> >>> Jason
> >>>
> >>> On Tue, Mar 22, 2011 at 10:16 AM, Jason Baldridge
> >>> <jbaldrid@mail.utexas.edu>wrote:
> >>>
> >>>> Hi everyone,
> >>>>
> >>>> Jorn and I have had a little discussion about a topic I brought up
> with
> >>> him
> >>>> that I'd like to get everyone's thoughts on. I'm including our
> >>> conversation
> >>>> below, but the gist of it is this:
> >>>>
> >>>>  - I've been switching to development in Scala. At this point, I
> >>> personally
> >>>> see little point in coding in Java given that Scala is available (and
> >>> very
> >>>> very nice) and it plays very well with existing Java -- I'm very happy
> >>> with
> >>>> this for several projects I'm working on, including TextGrounder<
> >>> http://code.google.com/p/textgrounder/>and
> >>>> Junto <http://code.google.com/p/junto/>. So, I'd like to see Scala
> >>> making
> >>>
> >>>> its way into OpenNLP.
> >>>>  - We need to reorganize the maxent code into the new package
> >>> opennlp.ml
> >>>>  - I'd like to create the new package, retaining the Java code as is,
> >>> make
> >>>> a first release, and then allow Scala code to mix in with the Java
> from
> >>> that
> >>>> point on
> >>>>  - A number of issues come up with this, including using another build
> >>> tool
> >>>> like SBT instead of Maven and ensuring we are Apache compliant and so
> >>> on.
> >>>> So, this is really just a feeler to see what you all think and see if
> >>> you
> >>>> have any enthusiasm, reservations or suggestions. Thanks!
> >>>>
> >>>> Jason
> >>>>
> >>>>
> >>>> Forwarded conversation
> >>>> Subject: opennlp.ml + Scala?
> >>>> ------------------------
> >>>>
> >>>> From: *Jason Baldridge* <jbaldrid@mail.utexas.edu>
> >>>> Date: Mon, Mar 21, 2011 at 1:28 PM
> >>>> To: Jörn Kottmann <kottmann@gmail.com>
> >>>>
> >>>>
> >>>> Hi Jorn,
> >>>>
> >>>> I've changed over to doing nearly all my coding in Scala, generally
> >>>> transitioning Java codebases to Scala by writing everything new in
> Scala
> >>> and
> >>>> using the existing Java classes as they are. I would like to do this
> as
> >>> part
> >>>> of the new opennlp.ml, as I'm not inclined to write any new Java code
> >>>> unless absolutely necessary, and I would very much like to create that
> >>> new
> >>>> and improved package. What do you think of this?
> >>>>
> >>>> Jason
> >>>>
> >>>> --
> >>>> Jason Baldridge
> >>>> Assistant Professor, Department of Linguistics
> >>>> The University of Texas at Austin
> >>>> http://www.jasonbaldridge.com
> >>>>
> >>>> ----------
> >>>> From: *Jörn Kottmann* <kottmann@gmail.com>
> >>>> Date: Mon, Mar 21, 2011 at 2:24 PM
> >>>> To: Jason Baldridge <jbaldrid@mail.utexas.edu>
> >>>>
> >>>>
> >>>>  Hmm, yeah, if we would rewrite it I think it is something we could
> >>>> consider, but in our case we just need
> >>>> to do some reshaping of the existing code and a little refactoring
> here
> >>> and
> >>>> there. That is one reason
> >>>> I believe we should be conservative and not use it in this case.
> >>>>
> >>>> Other issues I see is that it will be a message to the mahout people
> >>> that
> >>>> we do not want to collaborate,
> >>>> which in fact I believe is something we should do to get map reduce
> >>>> training support one day.
> >>>> The people in the team might not be familiar with scala, which could
> >>>> further limit the man power
> >>>> which is available for the re-factoring. Just my 2 cents.
> >>>>
> >>>> I believe we should also do the maxent refactoring slowly and first
do
> >>>> everything inside the current
> >>>> structures, and then when everythign is in place do the last changes
> >>> which
> >>>> break backward compatibilty.
> >>>>
> >>>> Anyway we should start a discussion about the future of OpenNLP, which
> >>>> features do we want
> >>>> to implement for the next few versions? Which new components would be
> >>> nice
> >>>> to have?
> >>>> I believe there are quit some people who are willing to pick up tasks
> >>> but
> >>>> are simply not
> >>>> aware about the possibility.
> >>>>
> >>>> Jörn
> >>>>
> >>>> ----------
> >>>> From: *Jason Baldridge* <jbaldrid@mail.utexas.edu>
> >>>> Date: Mon, Mar 21, 2011 at 3:29 PM
> >>>> To: Jörn Kottmann <kottmann@gmail.com>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Hmm... what if we did the first refactoring into opennlp.ml with pure
> >>> Java
> >>>> but the new package structure, then make a first release and then
> start
> >>>> bringing in Scala?
> >>>>
> >>>>
> >>>> Good points. However, I'm finding that Scala plays *very* nicely with
> >>> Java
> >>>> (including allowing Java to use Scala classes), so that could be
> mostly
> >>>> transparent to users of the package, maintaining the API pretty much
> as
> >>> it
> >>>> is. So, I *think* we could continue to play nicely with Mahout folks.
> >>>>
> >>>> Also, after coding for a while in Scala, I can't help but feel that
> Java
> >>>> the language is dead, while the JVM lives gloriously on. :) I think
> >>> there is
> >>>> a lot of momentum to Scala in general, and my feeling is that it is
> very
> >>>> friendly for Java programmers. (Though I had experience in functional
> >>>> programming before, so a lot of concepts came easily to me that could
> be
> >>>> more unusual for others.)
> >>>>
> >>>>
> >>>> What do you mean by "current structures"? Do you mean to keep the
> >>> classes
> >>>> as they are now, but just switch the package organization first?
> >>>>
> >>>>
> >>>> Yes, perhaps we should do that once the release is all done? (Thanks
> for
> >>>> all your hard work on that, btw!)
> >>>>
> >>>> Also, perhaps we should bring up the Scala question on the mailing
> list?
> >>> I
> >>>> wanted to ask you first to see if you had strong objections first, but
> >>> since
> >>>> you don't it might be good to sound out the community.
> >>>>
> >>>> Jason
> >>>>
> >>>>
> >>>> ----------
> >>>> From: *Jörn Kottmann* <kottmann@gmail.com>
> >>>> Date: Mon, Mar 21, 2011 at 3:38 PM
> >>>> To: Jason Baldridge <jbaldrid@mail.utexas.edu>
> >>>>
> >>>>
> >>>> I actually think just doing it for maxent/ml doesn't really makes
> sense,
> >>> if
> >>>> we want to switch the programming
> >>>> language its for entire code base. Then we speak about the migration
> of
> >>>> like 400 classes from java
> >>>> to scala, does that really makes sense? Just doing a little scala
> >>> doesn't
> >>>> sounds reasonable for me.
> >>>>
> >>>> Sure move it to the mailing list.
> >>>>
> >>>> Jörn
> >>>>
> >>>> ----------
> >>>> From: *Jason Baldridge* <jbaldrid@mail.utexas.edu>
> >>>> Date: Mon, Mar 21, 2011 at 5:44 PM
> >>>> To: Jörn Kottmann <kottmann@gmail.com>
> >>>>
> >>>>
> >>>> But, the great thing about Scala is that you can mix Scala and Java
> and
> >>> not
> >>>> have to do one or the other -- so I don't think we'd need to do a full
> >>>> migration.  Anyway, I'll bring it up on the list!
> >>>>
> >>>> ----------
> >>>> From: *Jörn Kottmann* <kottmann@gmail.com>
> >>>> Date: Mon, Mar 21, 2011 at 5:54 PM
> >>>> To: Jason Baldridge <jbaldrid@mail.utexas.edu>
> >>>>
> >>>>
> >>>> Yeah, but then still most of the code will remain to be pure java
> mixed
> >>>> with a little scala, but you have
> >>>> to deal with the extra complexity for having a little scala, e.g. more
> >>>> complex build tooling, you need
> >>>> extra IDE support, more complicated compatibility issues, etc.
> >>>>
> >>>> Jörn
> >>>>
> >>>> ----------
> >>>> From: *Jason Baldridge* <jbaldrid@mail.utexas.edu>
> >>>> Date: Mon, Mar 21, 2011 at 7:39 PM
> >>>> To: Jörn Kottmann <kottmann@gmail.com>
> >>>>
> >>>>
> >>>> The build is *really* easy with SBT (which can incorporate maven and
> ivy
> >>>> dependency declarations). The idea would be to transition to Scala so
> >>> that
> >>>> it would eventually be mostly scala, if not all scala. A standard jar
> is
> >>>> still distributed.
> >>>>
> >>>> ----------
> >>>> From: *Jörn Kottmann* <kottmann@gmail.com>
> >>>> Date: Tue, Mar 22, 2011 at 4:33 AM
> >>>> To: Jason Baldridge <jbaldrid@mail.utexas.edu>
> >>>>
> >>>>
> >>>> We are using maven right now, and it does a lot of more than just
> >>> putting
> >>>> together a jar file
> >>>> e.g.:
> >>>> - Making a release, with code signing, tagging in our SCM, producing
> rat
> >>>> reports, etc.
> >>>> - Deploying artifacts to the Apache repository
> >>>> - Building our documentation
> >>>> - Testing
> >>>> - Optionally it can run code quality tools like find bugs or a test
> >>>> coverage tools
> >>>>
> >>>> Jörn
> >>>>
> >>>> ----------
> >>>> From: *Jason Baldridge* <jbaldrid@mail.utexas.edu>
> >>>> Date: Tue, Mar 22, 2011 at 9:11 AM
> >>>> To: Jörn Kottmann <kottmann@gmail.com>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> These might need some looking into, but are probably doable.
> >>>>
> >>>>
> >>>> These are builtin targets for SBT.
> >>>>
> >>>> -j
> >>>>
> >>>> ----------
> >>>> From: *Jörn Kottmann* <kottmann@gmail.com>
> >>>> Date: Tue, Mar 22, 2011 at 9:20 AM
> >>>> To: Jason Baldridge <jbaldrid@mail.utexas.edu>
> >>>>
> >>>>
> >>>>  Our entire build system was just rewritten to meet Apache rules and
> >>>> standards, if we
> >>>> do that again now it will set the project back for like a month or so.
> >>>>
> >>>> Jörn
> >>>>
> >>>> ----------
> >>>> From: *Jason Baldridge* <jbaldrid@mail.utexas.edu>
> >>>> Date: Tue, Mar 22, 2011 at 9:33 AM
> >>>> To: Jörn Kottmann <kottmann@gmail.com>
> >>>>
> >>>>
> >>>> Fair enough. I will still bring it up as it now actually pains me to
> >>> code
> >>>> in Java. ;)
> >>>>
> >>>> Oh, here is how to deploy artifacts:
> >>>>
> >>>> http://henkelmann.eu/2010/11/14/sbt_hudson_with_test_integration
> >>>>
> >>>> I think the others would be straightforward. Possibly one of the
> bigger
> >>>> sticking points would be IDE integration -- I use Emacs and it all
> works
> >>>> very well for me, but I don't know how it is for Eclipse and NetBeans
> >>> folks.
> >>>> ----------
> >>>> From: *Jörn Kottmann* <kottmann@gmail.com>
> >>>> Date: Tue, Mar 22, 2011 at 9:40 AM
> >>>> To: Jason Baldridge <jbaldrid@mail.utexas.edu>
> >>>>
> >>>>
> >>>> I didn't say its not possible to rewrite our build with SBT, but I
> >>> strongly
> >>>> believe that is an effort which
> >>>> will take quite some time e.g. a month just to get a build which is
as
> >>> good
> >>>> as our maven build we just
> >>>> finished.
> >>>> All the people have to install the scala plugins into their IDEs to
> get
> >>>> proper support, which is
> >>>> of course also possible.
> >>>>
> >>>> Yeah bring it up on the mailing list.
> >>>>
> >>>> Jörn
> >>>>
> >>>> ----------
> >>>> From: *Jason Baldridge* <jbaldrid@mail.utexas.edu>
> >>>> Date: Tue, Mar 22, 2011 at 9:46 AM
> >>>> To: Jörn Kottmann <kottmann@gmail.com>
> >>>>
> >>>>
> >>>> Sounds good. And I find that it is often straightforward to take Maven
> >>>> specifications and either use them directly from SBT or translate them
> >>> into
> >>>> the SBT definitions.  Perhaps we could start this with opennlp.ml and
> >>> then
> >>>> see how it goes before doing it in the main OpenNLP code.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jason Baldridge
> >>>> Assistant Professor, Department of Linguistics
> >>>> The University of Texas at Austin
> >>>> http://www.jasonbaldridge.com
> >>>>
> >>>
> >>>
> >>> --
> >>> Jason Baldridge
> >>> Assistant Professor, Department of Linguistics
> >>> The University of Texas at Austin
> >>> http://www.jasonbaldridge.com
> >>>
> >>
> >
>
>


-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message