lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Participating in GSoC'11 with Lucene
Date Sat, 12 Mar 2011 23:11:12 GMT
Simon these are great summaries -- can you post them on the issues too?  Thanks!

On Sat, Mar 12, 2011 at 4:35 PM, Simon Willnauer
<simon.willnauer@googlemail.com> wrote:
> Hey,
>
> On Sat, Mar 12, 2011 at 5:32 PM, Zhijie Shen <zjshen14@gmail.com> wrote:
>> Hi developers,
>>
>> I'm a graduate student from National University of Singapore, majoring in
>> Computer Science. The enthusiasm of open source and information retrieval
>> drives me to participate in GSoC'11 with your community. I first got to know
>> Lucene when I was in a software engineer intern in IBM, working on Lotus
>> Connections.
>
> Awesome and welcome to Lucene :)
>>
>> Now I've already checked out the source code and successfully built it
>> locally. Meanwhile, I begin to read through the Jira issues, and are more
>> interested in Issue 2308, 2309 and 2621, which seem to be the refactoring
>> tasks (Please correct me if I'm wrong). My personal feeling is that these
>> tasks will be more appropriate for a beginner to get in. Moreover, I think
>> to start with such a big project, it is more efficient to read through the
>> discussion on Jira to understand the problem, and then dive into the related
>> code with the problem kept in mind. What is your opinion? I'm looking
>> forward to your guidance.
>
> Apparently you survived the first steps to get into lucene and solr!
> Great! You also looked at JIRA which is even better. So lemme tell you
> some words about the issues you have listed.
>
> LUCENE-2621 - Extend Codec to handle also stored fields and term vectors
> This is a very interesting and at the same time very much needed
> feature which involves API Design, Refactoring and in depth
> understanding of how IndexWriter and its internals work. The API which
> needs to be refactored (Codec API) was made to consume PostingLists
> once an in memory index segment is flushed to disc. Yet, to expose
> Stored Fields to this API we need to prepare it to consume data for
> every document while we build the in memory segment. So there is a
> little paradigm missmatch here which needs to be addressed.
>
> LUCENE-2309 - Fully decouple IndexWriter from analyzers
>
> This one is something I look forward to have for quite a while which
> would flatten the way for other analysis capabilities than the one
> lucene offers today. This seems to be refactoring-heavier that the
> other but might be require less knowledge about the IndexWriter (IW)
> internals than the codec one. Yet, it still is a very interesting
> issue / project to work on and fairly self-contained.
>
> LUCENE-2308 - Separately specify a field's type
>
> FieldType aims on the one hand to separate field properties from the
> actual value and on the other make Field's extensibility easier. Both
> seem equally important while far from easy to achieve. Fieldable and
> Field are a core API and changes to it need to well thought. Further
> this issue can easily cause drastic performance degradation if not
> done right. Consider this as a massive change since fields are used
> almost all over lucene and solr.
>
> I wrote those little summaries not to scare you away, not at all! I
> rather tried to find out what to expect from the issues and to make it
> easier for you to pick either one or another which you would like to
> work on. I will try to update the description of those issues if they
> are not already clear enough ( LUCENE-2621  seems kind of too brief
> though) in the next couple of days.
>
> If you have any question regarding those issues or any other, feel
> free to ask here on the list or on the issue directly (you might need
> a JIRA account if you don't have one already you should get one :)
> Reading the JIRA issue might help you to understand what those issues
> about but those are usually written by core devs or long time
> contributors so please as any question you have and don't hesitate to
> ask if you have problems with anything.
>
> Simon
>>
>> Regards,
>> Zhijie
>>
>> --
>> Zhijie Shen
>> School of Computing
>> National University of Singapore
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>



-- 
Mike

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message