incubator-gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Date Thu, 15 Sep 2011 18:55:25 GMT

> Hi Guys,
> 
> I thought I'd chime in on this thread. My comments below:
> > I understand and share your frustration, however you need to bear in mind
> > that things are done only if people volunteer and have time - usually
> > taken from their holiday, weekends, evenings. Chris (who is the de facto
> > release master for Nutch and Gora) has not had the time and nobody else
> > has volunteered to do it.
> 
> Yep I haven't had the time to push a Gora 0.1.1-incubating release that
> will address the Maven issues. However it is on my roadmap for open source
> stuff to get done in the next month, so that's a good thing. But yes, that
> portion of my open source work is all volunteer time, so sometimes other
> things take priority.
> 
> >> As it happens, yesterday was the 1 year anniversary of the last
> >> successful Hudson/Jenkins build...  If that actually worked, we could
> >> point people towards it as a useful recipe for how to get a build
> >> working off trunk.  I haven't been following Nutch too closely, but it
> >> always strikes me as really odd, that there's a nightly build and it
> >> doesn't bother anybody that it fails all the time (and that there
> >> isn't a nightly build for the stable branches).
> > 
> > The real issue behind all this is what we should do with Nutch 2.0. What
> > follows is only my opinion and I would love to hear what others have to
> > say on this subject.
> > 
> > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> > Gora, the latter hasn't really taken off since incubation. There have
> > been some modest contributions to it but it does not seem to be used
> > much and there is virtually nothing happening on it in terms of
> > development. More worryingly, the people who initially contributed to it
> > are not very active on the project (such is life, new jobs, different
> > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any
> > progress in  the last 12 months : we still have the same bugs, the tests
> > do not work, the build has to be done manually etc...
> 
> Yep.
> 
> > At the same time, there has been a new lease of life into Nutch as a
> > whole : there is definitely more activity on the mailing lists, new
> > users, new active committers  etc... and quite a few bugfixes and
> > improvements - most of them backported from what had been done in the
> > trunk and people seem fairly happy with what we can do with 1.4
> 
> Totally agreed. I'm actually not super surprised -- ever since 1.1, I kind
> of felt that maintaining a stable 1.X branch of Nutch (in parallel to the
> 2.0 efforts) was really going to pay off since there was renewed interest
> from users in leveraging (and furthermore accepting) the nuances of 1.X.
> 
> > So the question is : what shall we do with 2.0? Here are a few
> > possibilities
> > 
> > 
> > a) put some effort into it, fix the bugs and make so that it can be used
> > instead of 1.x
> > b) shelve it and leave it for enthusiasts to play with + make 1.x the
> > trunk again
> > c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> > branches is quite a pain)
> > d) abandon the idea of a neutral storage layer with Gora and hardwire it
> > to e.g. HBase
> > 
> > Option (a) has not happened in the last 12 months and I am not very
> > hopeful about it.
> > 
> > What do you guys think?
> 
> I'd suggest an option e). Evolve and keep releasing 1.X over the next 6
> months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is to
> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get
> to ~1.6 over the next 6 months and there is still no active development on
> 2.0, I'd propose we do this at that point in time:
> 
> 1. branch the current trunk as
> https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab latest
> stable branch (e.g.,
> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and *replace*
> the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
> development on stable becomes active development in trunk and nutchgora
> still exists in case anyone ever resurrects it.
> 
> That way, we give another 6 months to see how it shakes out and potentially
> allow for 1 or 2 or 3 more stable releases before switching those over to
> trunk.
> 
> Thoughts?

Yes. I don't believe we should wait until january before discussing this topic 
again. I, for example, cannot spend considerable extra time on the issues i 
put in 1.4, also due to the fact that it's not entirely stable.

There are many things i can write about this topic right now but don't feel 
it's neccessary. The choice is difficult and perhaps painful but when the 
voting round is opened by our project lead, i will vote for promoting 1.x back 
to trunk.

My apologies for my impatience and pessimism.

> 
> BTW, I have a couple contributions from my CS572: Search Engines class from
> a year ago that I'd love to port into the Nutch stable branch including
> Hubs/Authorities ranking and some other goodies. I'll try and work on
> those over the next few months, I'm just letting everyone know now so I
> don't forget again :-)
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Mime
View raw message