lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <>
Subject Re: Participating in GSoC'11 with Lucene
Date Sun, 13 Mar 2011 12:11:06 GMT
On Sun, Mar 13, 2011 at 12:11 AM, Michael McCandless
<> wrote:
> Simon these are great summaries -- can you post them on the issues too?  Thanks!


> On Sat, Mar 12, 2011 at 4:35 PM, Simon Willnauer
> <> wrote:
>> Hey,
>> On Sat, Mar 12, 2011 at 5:32 PM, Zhijie Shen <> wrote:
>>> Hi developers,
>>> I'm a graduate student from National University of Singapore, majoring in
>>> Computer Science. The enthusiasm of open source and information retrieval
>>> drives me to participate in GSoC'11 with your community. I first got to know
>>> Lucene when I was in a software engineer intern in IBM, working on Lotus
>>> Connections.
>> Awesome and welcome to Lucene :)
>>> Now I've already checked out the source code and successfully built it
>>> locally. Meanwhile, I begin to read through the Jira issues, and are more
>>> interested in Issue 2308, 2309 and 2621, which seem to be the refactoring
>>> tasks (Please correct me if I'm wrong). My personal feeling is that these
>>> tasks will be more appropriate for a beginner to get in. Moreover, I think
>>> to start with such a big project, it is more efficient to read through the
>>> discussion on Jira to understand the problem, and then dive into the related
>>> code with the problem kept in mind. What is your opinion? I'm looking
>>> forward to your guidance.
>> Apparently you survived the first steps to get into lucene and solr!
>> Great! You also looked at JIRA which is even better. So lemme tell you
>> some words about the issues you have listed.
>> LUCENE-2621 - Extend Codec to handle also stored fields and term vectors
>> This is a very interesting and at the same time very much needed
>> feature which involves API Design, Refactoring and in depth
>> understanding of how IndexWriter and its internals work. The API which
>> needs to be refactored (Codec API) was made to consume PostingLists
>> once an in memory index segment is flushed to disc. Yet, to expose
>> Stored Fields to this API we need to prepare it to consume data for
>> every document while we build the in memory segment. So there is a
>> little paradigm missmatch here which needs to be addressed.
>> LUCENE-2309 - Fully decouple IndexWriter from analyzers
>> This one is something I look forward to have for quite a while which
>> would flatten the way for other analysis capabilities than the one
>> lucene offers today. This seems to be refactoring-heavier that the
>> other but might be require less knowledge about the IndexWriter (IW)
>> internals than the codec one. Yet, it still is a very interesting
>> issue / project to work on and fairly self-contained.
>> LUCENE-2308 - Separately specify a field's type
>> FieldType aims on the one hand to separate field properties from the
>> actual value and on the other make Field's extensibility easier. Both
>> seem equally important while far from easy to achieve. Fieldable and
>> Field are a core API and changes to it need to well thought. Further
>> this issue can easily cause drastic performance degradation if not
>> done right. Consider this as a massive change since fields are used
>> almost all over lucene and solr.
>> I wrote those little summaries not to scare you away, not at all! I
>> rather tried to find out what to expect from the issues and to make it
>> easier for you to pick either one or another which you would like to
>> work on. I will try to update the description of those issues if they
>> are not already clear enough ( LUCENE-2621  seems kind of too brief
>> though) in the next couple of days.
>> If you have any question regarding those issues or any other, feel
>> free to ask here on the list or on the issue directly (you might need
>> a JIRA account if you don't have one already you should get one :)
>> Reading the JIRA issue might help you to understand what those issues
>> about but those are usually written by core devs or long time
>> contributors so please as any question you have and don't hesitate to
>> ask if you have problems with anything.
>> Simon
>>> Regards,
>>> Zhijie
>>> --
>>> Zhijie Shen
>>> School of Computing
>>> National University of Singapore
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> --
> Mike

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message