lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée <nicolas.lale...@anyware-tech.com>
Subject Re: Flexible indexing
Date Tue, 13 Mar 2007 09:50:04 GMT
Le Dimanche 11 Mars 2007 22:41, Michael Busch a écrit :
> Hi Grant,
>
> I certainly agree that it would be great if we could make some progress
> and commit the payloads patch soon. I think it is quite independent from
> FI. FI will introduce different posting formats (see Wiki:
> http://wiki.apache.org/lucene-java/FlexibleIndexing). Payloads will be
> part of some of those formats, but not all (i. e. per-position payloads
> only make sense if positions are stored).
>
> The only concern some people had was about the API the patch introduces.
> It extends Token and TermPositions. Doug's argument was, that if we
> introduce new APIs now but want to change them with FI, then it will be
> hard to support those APIs. I think that is a valid point, but at the
> same time it slows down progress to have to plan ahead in too many
> directions. That's why I'd vote for marking the new APIs as experimental
> so that people can try them out at own risk.
> If we could agree on that approach then I'd go ahead and submit an
> updated payloads patch in the next days, that applies cleanly on the
> current trunk and contains the additional warnings in the javadocs.
>
>
> In regard of FI and 662 however I really believe we should split it up
> and plan ahead (in a way I mentioned already), so that we have more
> isolated patches. It is really great that we have 662 already (Nicolas,
> thank you so much for your hard work, I hope you'll keep working with us
> on FI!!). We'll probably use some of that code, and it will definitely
> be helpful.

thanks ! :)

About the code split you are talking about, I definitively agree. Here is what 
will contain the three parts :
1) index format concept :
- there is an interface defining it, just for now handling the filename 
extensions.
- modify the directory abstract class and the implementations to be the 
container of the index format.
- modify the SegmentInfos class to do some check about the opened index format 
and the index format defined in the Directory class.
- modify the writer to make it check format conflits while adding raw indexes
2) extensibility of the store reader/writer :
- add to the previous interface some new entry points : a FieldsReader and a 
FieldsWriter.
- split the current FieldsReader and FieldsWriter in two parts : the part 
which will be still handled by Lucene, and the extendable ones which will be 
instanciated by a DefaultIndexFormat.
- split the implementation of Field in two parts : the Field and a FieldData, 
so the user will be able to define his custom field-data java object.
3) New: extensibility of the posting reader/writer
this is just a draft for now, but here is what was done :
- move Posting from a inner class to a public class
- make TermInfo handling a pool of "pointers" : the default implementation has 
two, the frq one and the prx one.
- extract the posting writing from DocumentWriter into a DefaultPostingWriter.

I can provide a patch for the first step.

cheers,
Nicolas

>
> Michael
>
> Grant Ingersoll wrote:
> > Hi Michael,
> >
> > This is very good.  I know 662 is different, just wasn't sure if
> > Nicolas patch was meant to be applied after 662, b/c I know we had
> > discussed this before.
> >
> > I do agree with you about planning this out, but I also know that
> > patches seem to motivate people the best and provide a certain
> > concreteness to it all.  I mostly started asking questions on these
> > two issues b/c I wanted to spur some more discussion and see if we can
> > get people motivated to move on it.
> >
> > I was hoping that I would be able to apply each patch to two different
> > checkouts so I could start seeing where the overlap is and how they
> > could fit together (I also admit I was procrastinating on my ApacheCon
> > talk...).  In the new, flexible world, the payloads implementation
> > could be a separate implementation of the indexing or it could be part
> > of the core/existing file format implementation.  Sometimes I just
> > need to get my hands on the code to get a real feel for what I feel is
> > the best way to do it.
> >
> > I agree about the XML storage for Index information.  We do that in
> > our in-house wrapper around Lucene, storing info about the language,
> > analyzer used, etc.  We may also want a binary index-level storage
> > capability.  I know most people just create a single document usually
> > to store binary info about the index, but an binary storage might be
> > good too.
> >
> > Part of me says to apply the Payloads patch now, as it provides a lot
> > of bang for the buck and I think the FI is going to take a lot longer
> > to hash out.  However, I know that it may pin us in or force us to
> > change things for FI.  Ultimately, I would love to see both these
> > features for the next release, but that isn't a requirement.  Also, on
> > FI, I would love to see two different implementations of whatever API
> > we choose before releasing it, as I always find two implementations of
> > an Interface really work out the API details.
> >
> > -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org

-- 
Nicolas LALEVÉE
Solutions & Technologies
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message