lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Nemeskey <>
Subject Re: GSoC
Date Wed, 02 Feb 2011 09:10:25 GMT
Hi guys,

Mark, Robert, Simon: thanks for the support! I really hope we can work 
together this summer (and before that, obviously).

According to , there's 
still some time until the application period. So let me use this week to finish 
my PhD research plan, and get back to you next week.

I am not really familiar with how the program works, i.e. how detailed the 
application description should be, when mentorship is decided, etc. so I guess 
we will have a lot to talk about. :)

(Actually, should we move this discussion private?)


> Hi David, honestly this sounds fantastic.
> It would be great to have someone to work with us on this issue!
> To date, progress is pretty slow-going (minor improvements, cleanups,
> additional stats here and there)... but we really need all the help we
> can get, especially from people who have a really good understanding
> of the various models.
> In case you are interested, here are some references to discussions
> about adding more flexibility (with some prototypes etc):
> _towards_making_lucene_s_scoring_more_flexible

> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
> <> wrote:
> > Hi all,
> > 
> > I have already sent this mail to Simon Willnauer, and he suggested me to
> > post it here for discussion.
> > 
> > I am David Nemeskey, a PhD student at the Eotvos Lorand University,
> > Budapest, Hungary. I am doing an IR-related research, and we have
> > considered using Lucene as our search engine. We were quite satisfied
> > with the speed and ease of use. However, we would like to experiment
> > with different ranking algorithms, and this is where problems arise.
> > Lucene only supports the VSM, and unfortunately the ranking architecture
> > seems to be tailored specifically to its needs.
> > 
> > I would be very much interested in revamping the ranking component as a
> > GSoC project. The following modifications should be doable in the
> > allocated time frame:
> > - a new ranking class hierarchy, which is generic enough to allow easy
> > implementation of new weighting schemes (at least bag-of-words ones),
> > - addition of state-of-the-art ranking methods, such as Okapi BM25,
> > proximity and DFR models,
> > - configuration for ranking selection, with the old method as default.
> > 
> > I believe all users of Lucene would profit from such a project. It would
> > provide the scientific community with an even more useful research aid,
> > while regular users could benefit from superior ranking results.
> > 
> > Please let me know your opinion about this proposal.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message