lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Participating in GSoC'11 with Lucene
Date Sat, 12 Mar 2011 23:11:12 GMT
Simon these are great summaries -- can you post them on the issues too?  Thanks!

On Sat, Mar 12, 2011 at 4:35 PM, Simon Willnauer
<> wrote:
> Hey,
> On Sat, Mar 12, 2011 at 5:32 PM, Zhijie Shen <> wrote:
>> Hi developers,
>> I'm a graduate student from National University of Singapore, majoring in
>> Computer Science. The enthusiasm of open source and information retrieval
>> drives me to participate in GSoC'11 with your community. I first got to know
>> Lucene when I was in a software engineer intern in IBM, working on Lotus
>> Connections.
> Awesome and welcome to Lucene :)
>> Now I've already checked out the source code and successfully built it
>> locally. Meanwhile, I begin to read through the Jira issues, and are more
>> interested in Issue 2308, 2309 and 2621, which seem to be the refactoring
>> tasks (Please correct me if I'm wrong). My personal feeling is that these
>> tasks will be more appropriate for a beginner to get in. Moreover, I think
>> to start with such a big project, it is more efficient to read through the
>> discussion on Jira to understand the problem, and then dive into the related
>> code with the problem kept in mind. What is your opinion? I'm looking
>> forward to your guidance.
> Apparently you survived the first steps to get into lucene and solr!
> Great! You also looked at JIRA which is even better. So lemme tell you
> some words about the issues you have listed.
> LUCENE-2621 - Extend Codec to handle also stored fields and term vectors
> This is a very interesting and at the same time very much needed
> feature which involves API Design, Refactoring and in depth
> understanding of how IndexWriter and its internals work. The API which
> needs to be refactored (Codec API) was made to consume PostingLists
> once an in memory index segment is flushed to disc. Yet, to expose
> Stored Fields to this API we need to prepare it to consume data for
> every document while we build the in memory segment. So there is a
> little paradigm missmatch here which needs to be addressed.
> LUCENE-2309 - Fully decouple IndexWriter from analyzers
> This one is something I look forward to have for quite a while which
> would flatten the way for other analysis capabilities than the one
> lucene offers today. This seems to be refactoring-heavier that the
> other but might be require less knowledge about the IndexWriter (IW)
> internals than the codec one. Yet, it still is a very interesting
> issue / project to work on and fairly self-contained.
> LUCENE-2308 - Separately specify a field's type
> FieldType aims on the one hand to separate field properties from the
> actual value and on the other make Field's extensibility easier. Both
> seem equally important while far from easy to achieve. Fieldable and
> Field are a core API and changes to it need to well thought. Further
> this issue can easily cause drastic performance degradation if not
> done right. Consider this as a massive change since fields are used
> almost all over lucene and solr.
> I wrote those little summaries not to scare you away, not at all! I
> rather tried to find out what to expect from the issues and to make it
> easier for you to pick either one or another which you would like to
> work on. I will try to update the description of those issues if they
> are not already clear enough ( LUCENE-2621  seems kind of too brief
> though) in the next couple of days.
> If you have any question regarding those issues or any other, feel
> free to ask here on the list or on the issue directly (you might need
> a JIRA account if you don't have one already you should get one :)
> Reading the JIRA issue might help you to understand what those issues
> about but those are usually written by core devs or long time
> contributors so please as any question you have and don't hesitate to
> ask if you have problems with anything.
> Simon
>> Regards,
>> Zhijie
>> --
>> Zhijie Shen
>> School of Computing
>> National University of Singapore
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message