lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen" <>
Subject Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes
Date Thu, 04 Dec 2008 19:21:21 GMT
To put things in perspective, I believe Microsoft (who could potentially
place a lot of resources towards Lucene) now uses Lucene through Powerset?
and I don't think those folks are contributing back.  I know of several
other companies who do the same, and many potential contributions that are
not submitted because people and their companies do not see the benefit of
going through the hoops required to get patches committed.  A relatively
simple patch such as 1473 Serialization represents this well.

For example if a company is developing custom search algorithms, Lucene
supports TF/IDF but not much else.  Custom search algorithms require
rewriting lots of Lucene code.  Companies who write new search algorithms do
not necessarily want to rewrite Lucene as well to make it pluggable for new
scoring as it is out of scope, they will simply branch the code.  It does
not help that the core APIs underneath IndexReader are protected and package
protected which assumes a user that is not advanced.  It is repeated in the
mailing lists that new features will threaten the existing user base which
is based on opinion rather than fact.  More advanced users are currently
hindered by the conservatism of the project and so naturally have stopped
trying to submit changes that alter the core non-public code.

The rancor is from users would benefit from a faster pace and the ability to
be more creative inside the core Lucene system.  As the internals change
frequently and unnannounced the process of developing core patches is
difficult and frustrating.

Now that Lucene is stable and flexible indexing is being implemented.  It
would benefit the community to focus on the future.  Who exactly is
responsible for this?  Which of the committers are building for the future?
Which are doing bug fixes?  What is the process of developing more advanced
features in open source?  Right now it seems to be one person, Michael
McCandless developing all of the new core code.  This is great forward
progress, however it's unclear how others can get involved and not get
stampeded by the constant changes that all happen via one brilliant person.

I have requested of people such as Michael Busch to collaborate on the
column stride fields and received no response.

To me, an good example of volunteers are people who prepare food and donate
their time at soup kitchens with no pay, and no hope for pay related to
feeding the hungry.


On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <> wrote:

> On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote:
>> Hoss wrote: "sort of mythical "Lucene powerhouse"
>> Lucene seems to run itself quite differently than other open source Java
>> projects.  Perhaps it would be good to spell out the reasons for the
>> reluctance to move ahead with features that developers work on, that work,
>> but do not go in.  The developer contributions seem to be quite low right
>> now, especially compared to neighbor projects such as Hadoop.  Is this
>> because fewer people are using Lucene?  Or is it due to the reluctance to
>> work with the developer community?  Unfortunately the perception in the eyes
>> of some people who work on search related projects it is the latter.
> Or, could it be that Hadoop is relatively new and in vogue at the moment,
> very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates
> lots of resources to it on a full time basis, whilst Lucene has been around
> in the ASF for 7+ years (and 12+ years total) and has a really large install
> base and thus must move more deliberately and basically has 1 person who
> gets to work on it full time while the rest of us pretty much volunteer?
>  That's not an excuse, it's just the way it is.  I personally, would love to
> work on Lucene all day every day as I have a lot of things I'd love to
> engage the community on, but the fact is I'm not paid to do that, so I give
> what I can when I can.  I know most of the other committers are that way
> too.
> Thus, I don't think any one of us has a reluctance to move ahead with
> features or bug fixes.   Looking at CHANGES.txt, I see a lot of
> contributors.  Looking at java-dev and JIRA, I see lots of engagement with
> the community.  Is it near the historical high for traffic, no it's not, but
> that isn't necessarily a bad thing.  I think it's a sign that Lucene is
> pretty stable.
> What we do have a reluctance for are patches that don't have tests (i.e.
> this one), patches that massively change Lucene APIs in non-trivial ways or
> break back compatibility or are not kept up to date.  Are we perfect?  Of
> course not.  I, personally, would love for there to be a way that helps us
> process a larger volume of patches (note, I didn't say commit a larger
> volume).  Hadoop's automated patch tester would be a huge start in that, but
> at the end of the day, Lucene still works the way all ASF projects do: via
> meritocracy and volunteerism.     You want stuff committed, keep it up to
> date, make it manageable to review, document it, respond to
> questions/concerns with answers as best you can.  To that end, a real simple
> question can go a long way and getting something committed, and it simply
> is:  "Hey Lucener's,  what else can I do to help you review and commit
> LUCENE-XXXX?"  Lather, rinse, repeat.   Next thing you know, you'll be on
> the receiving end as a committer.
> -Grant
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message