Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 19226 invoked from network); 10 Mar 2011 10:54:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Mar 2011 10:54:51 -0000 Received: (qmail 2994 invoked by uid 500); 10 Mar 2011 10:54:50 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 2946 invoked by uid 500); 10 Mar 2011 10:54:50 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 2939 invoked by uid 99); 10 Mar 2011 10:54:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Mar 2011 10:54:50 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [193.6.200.142] (HELO bl2.lvs.sztaki.hu) (193.6.200.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Mar 2011 10:54:46 +0000 Received: (qmail 23081 invoked by uid 1205); 10 Mar 2011 11:54:23 +0100 Received: from bothzsolt.ilab.sztaki.hu by bl2.lvs.sztaki.hu (envelope-from , uid 1202) with qmail-scanner-2.02st (clamdscan: 0.96.1. spamassassin: 3.2.5. perlscan: 2.02st. Clear:RC:1(10.1.2.95):SA:0(-2.7/5.0):. Processed in 3.694602 secs); 10 Mar 2011 10:54:23 -0000 X-Spam-Report: SA TESTS -1.8 ALL_TRUSTED Passed through trusted hosts only via SMTP 2.4 TVD_SPACED_SUBJECT_WORD3 TVD_SPACED_SUBJECT_WORD3 -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] -0.7 AWL AWL: From: address is in the auto white-list Received: from bothzsolt.ilab.sztaki.hu (HELO ndavid-vostro.localnet) (ndavid@[10.1.2.95]) (envelope-sender ) by smtp.sztaki.hu (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 10 Mar 2011 11:54:19 +0100 From: David Nemeskey To: dev@lucene.apache.org Subject: Re: GSoC Date: Thu, 10 Mar 2011 11:54:18 +0100 User-Agent: KMail/1.13.5 (Linux/2.6.35-28-generic; KDE/4.6.1; x86_64; ; ) Cc: Simon Willnauer , Grant Ingersoll References: <201101281732.42681.nemeskey.david@sztaki.hu> <66904DF1-3776-494F-8765-79E5181CE738@apache.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201103101154.18823.nemeskey.david@sztaki.hu> X-Old-Spam-Status: No, hits=-2.7 required=5.0 Ok, I have created a new issue, LUCENE-2959 for this project. I have uploaded the pdfs and added the gsoc2011 and lucene-gsoc-2011 labels as well. David On 2011 March 09, Wednesday 21:58:53 Simon Willnauer wrote: > On Wed, Mar 9, 2011 at 5:48 PM, Grant Ingersoll wrote: > > I think we, Lucene committers, need to identify who is willing to mentor. > > In my experience, it is less than 5 hours a week. Most of the work > > is done as part of the community. Sometimes you have to be tough and > > fail someone (I did last year) but most of the time, if you take the > > time to interview the candidates up front, it is a good experience for > > everyone. > > count me in > > > I'd add it would be useful to have everyone put the lucene-gsoc-11 label > > on their issues too, that way we can quickly find the Lucene ones. > > done on at least one ;) > > simon > > > Also, feel free to label existing bugs. > > > > On Mar 9, 2011, at 2:11 AM, Simon Willnauer wrote: > >> Hey David and all others who want to contribute to GSoC, > >> > >> the ASF has applied for GSoC 2011 as a mentoring organization. As a > >> ASF project we don't need to apply directly though but we need to > >> register our ideas now. This works like almost anything in the ASF > >> through JIRA. All ideas should be recorded as JIRA tickets labeled > >> with "gsoc2011". Once this is done it will show up here: > >> http://s.apache.org/gsoc2011tasks > >> > >> Everybody who is interested in GSoC as a mentor or student should now > >> read this too http://community.apache.org/gsoc.html > >> > >> > >> Thanks, > >> > >> Simon > >> > >> > >> > >> > >> On Thu, Feb 24, 2011 at 12:14 PM, David Nemeskey > >> > >> wrote: > >>> Please find the implementation plan attached. The word "soon" gets a > >>> new meaning when power outages are taken into account. :) > >>> > >>> As before, comments are welcome. > >>> > >>> David > >>> > >>> On Tuesday, February 22, 2011 15:22:57 Simon Willnauer wrote: > >>>> I think that is good for now. I should get started on codeawards and > >>>> wrap up our proposals. I hope I can do that this week. > >>>> > >>>> simon > >>>> > >>>> On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey > >>>> > >>>> wrote: > >>>>> Hey, > >>>>> > >>>>> I have written the proposal. Please let me know if you want more / > >>>>> less of certain parts. Should I upload it somewhere? > >>>>> > >>>>> Implementation plan soon to follow. > >>>>> > >>>>> Sorry for the late reply; I have been rather busy these past few > >>>>> weeks. > >>>>> > >>>>> David > >>>>> > >>>>> On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote: > >>>>>> Hey David, > >>>>>> > >>>>>> I saw that you added a tiny line to the GSoC Lucene wiki - thanks > >>>>>> for that. > >>>>>> > >>>>>> On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey > >>>>>> > >>>>>> wrote: > >>>>>>> Hi guys, > >>>>>>> > >>>>>>> Mark, Robert, Simon: thanks for the support! I really hope we can > >>>>>>> work together this summer (and before that, obviously). > >>>>>> > >>>>>> Same here! > >>>>>> > >>>>>>> According to http://www.google- > >>>>>>> melange.com/document/show/gsoc_program/google/gsoc2011/timeline , > >>>>>>> there's still some time until the application period. So let me use > >>>>>>> this week to finish my PhD research plan, and get back to you next > >>>>>>> week. > >>>>>>> > >>>>>>> I am not really familiar with how the program works, i.e. how > >>>>>>> detailed the application description should be, when mentorship is > >>>>>>> decided, etc. so I guess we will have a lot to talk about. :) > >>>>>> > >>>>>> so from a 10000ft view it work like this: > >>>>>> > >>>>>> 1. Write up a short proposal what your idea is about > >>>>>> 2. make it public! and publish a implementation plan - how you would > >>>>>> want to realize your proposal. If you don't follow that 100% in the > >>>>>> actual impl. don't worry. Its just mean to give us an idea that you > >>>>>> know what you are doing and where you want to go. something like a 1 > >>>>>> A4 rough design doc. > >>>>>> 3. give other people the change to apply for the same suggestion > >>>>>> (this is how it works though) > >>>>>> 4 Let the ASF / us assign one or more possible mentors to it > >>>>>> 5. let us apply for a slot in GSoC (those are limited for > >>>>>> organizations) 6. get accepted > >>>>>> 7. rock it! > >>>>>> > >>>>>>> (Actually, should we move this discussion private?) > >>>>>> > >>>>>> no - we usually do everything in public except of discussion within > >>>>>> the PMC that are meant to be private for legal reasons or similar > >>>>>> things. Lets stick to the mailing list for all communication except > >>>>>> you have something that should clearly not be public. This also give > >>>>>> other contributors a chance to help and get interested in your > >>>>>> work!! > >>>>>> > >>>>>> simon > >>>>>> > >>>>>>> David > >>>>>>> > >>>>>>>> Hi David, honestly this sounds fantastic. > >>>>>>>> > >>>>>>>> It would be great to have someone to work with us on this issue! > >>>>>>>> > >>>>>>>> To date, progress is pretty slow-going (minor improvements, > >>>>>>>> cleanups, additional stats here and there)... but we really need > >>>>>>>> all the help we can get, especially from people who have a really > >>>>>>>> good understanding of the various models. > >>>>>>>> > >>>>>>>> In case you are interested, here are some references to > >>>>>>>> discussions about adding more flexibility (with some prototypes > >>>>>>>> etc): > >>>>>>>> http://www.lucidimagination.com/search/document/72787e0e54f798e4/ > >>>>>>>> baby _st eps _towards_making_lucene_s_scoring_more_flexible > >>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2392 > >>>>>>>> > >>>>>>>> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey > >>>>>>>> > >>>>>>>> wrote: > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> I have already sent this mail to Simon Willnauer, and he > >>>>>>>>> suggested me to post it here for discussion. > >>>>>>>>> > >>>>>>>>> I am David Nemeskey, a PhD student at the Eotvos Lorand > >>>>>>>>> University, Budapest, Hungary. I am doing an IR-related > >>>>>>>>> research, and we have considered using Lucene as our search > >>>>>>>>> engine. We were quite satisfied with the speed and ease of use. > >>>>>>>>> However, we would like to experiment with different ranking > >>>>>>>>> algorithms, and this is where problems arise. Lucene only > >>>>>>>>> supports the VSM, and unfortunately the ranking architecture > >>>>>>>>> seems to be tailored specifically to its needs. > >>>>>>>>> > >>>>>>>>> I would be very much interested in revamping the ranking > >>>>>>>>> component as a GSoC project. The following modifications should > >>>>>>>>> be doable in the allocated time frame: > >>>>>>>>> - a new ranking class hierarchy, which is generic enough to allow > >>>>>>>>> easy implementation of new weighting schemes (at least > >>>>>>>>> bag-of-words ones), - addition of state-of-the-art ranking > >>>>>>>>> methods, such as Okapi BM25, proximity and DFR models, > >>>>>>>>> - configuration for ranking selection, with the old method as > >>>>>>>>> default. > >>>>>>>>> > >>>>>>>>> I believe all users of Lucene would profit from such a project. > >>>>>>>>> It would provide the scientific community with an even more > >>>>>>>>> useful research aid, while regular users could benefit from > >>>>>>>>> superior ranking results. > >>>>>>>>> > >>>>>>>>> Please let me know your opinion about this proposal. > >>>>>>> > >>>>>>> ------------------------------------------------------------------- > >>>>>>> -- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > >>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org > >>>>>> > >>>>>> -------------------------------------------------------------------- > >>>>>> - To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > >>>>>> For additional commands, e-mail: dev-help@lucene.apache.org > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > >>>>> For additional commands, e-mail: dev-help@lucene.apache.org > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > >>> For additional commands, e-mail: dev-help@lucene.apache.org > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: dev-help@lucene.apache.org > > > > -------------------------- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem docs using Solr/Lucene: > > http://www.lucidimagination.com/search > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > > For additional commands, e-mail: dev-help@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org