lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: GSoC
Date Wed, 02 Feb 2011 15:57:16 GMT

On Feb 2, 2011, at 4:10 AM, David Nemeskey wrote:

> Hi guys,
> Mark, Robert, Simon: thanks for the support! I really hope we can work 
> together this summer (and before that, obviously).

Sounds like a great idea.  Looking forward to the proposal.

> According to
> , there's 
> still some time until the application period. So let me use this week to finish 
> my PhD research plan, and get back to you next week.
> I am not really familiar with how the program works, i.e. how detailed the 
> application description should be, when mentorship is decided, etc. so I guess 
> we will have a lot to talk about. :)

It's pretty competitive, especially since you are not only competing against others for Lucene
slots, but you are competing against other ASF projects.  I highly recommend you, as well
as interested mentors, look through Mahout's past GSOC projects:
and and

> (Actually, should we move this discussion private?)

No, you shouldn't and it would be to your detriment come the ranking process since people
won't have a track record of what you've done as it relates to your proposal.  The goal of
GSOC is to learn how Open Source works.  Even though you have a mentor, that person is there
to help you navigate the community, not to be a private tutor on technical details.   I routinely
tell all my students that I will help them w/ personal issues (vacation, emergencies, etc.)
but that all technical stuff must be done on list (JIRA, IRC, dev@, patches, etc.)

> David
>> Hi David, honestly this sounds fantastic.
>> It would be great to have someone to work with us on this issue!
>> To date, progress is pretty slow-going (minor improvements, cleanups,
>> additional stats here and there)... but we really need all the help we
>> can get, especially from people who have a really good understanding
>> of the various models.
>> In case you are interested, here are some references to discussions
>> about adding more flexibility (with some prototypes etc):
>> _towards_making_lucene_s_scoring_more_flexible
>> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>> <> wrote:
>>> Hi all,
>>> I have already sent this mail to Simon Willnauer, and he suggested me to
>>> post it here for discussion.
>>> I am David Nemeskey, a PhD student at the Eotvos Lorand University,
>>> Budapest, Hungary. I am doing an IR-related research, and we have
>>> considered using Lucene as our search engine. We were quite satisfied
>>> with the speed and ease of use. However, we would like to experiment
>>> with different ranking algorithms, and this is where problems arise.
>>> Lucene only supports the VSM, and unfortunately the ranking architecture
>>> seems to be tailored specifically to its needs.
>>> I would be very much interested in revamping the ranking component as a
>>> GSoC project. The following modifications should be doable in the
>>> allocated time frame:
>>> - a new ranking class hierarchy, which is generic enough to allow easy
>>> implementation of new weighting schemes (at least bag-of-words ones),
>>> - addition of state-of-the-art ranking methods, such as Okapi BM25,
>>> proximity and DFR models,
>>> - configuration for ranking selection, with the old method as default.
>>> I believe all users of Lucene would profit from such a project. It would
>>> provide the scientific community with an even more useful research aid,
>>> while regular users could benefit from superior ranking results.
>>> Please let me know your opinion about this proposal.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll

Search the Lucene ecosystem docs using Solr/Lucene:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message