lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée <nicolas.lale...@anyware-tech.com>
Subject Re: Payloads
Date Fri, 22 Dec 2006 14:57:56 GMT
Le Mercredi 20 Décembre 2006 20:42, Michael Busch a écrit :
> Doug Cutting wrote:
> > Michael,
> >
> > This sounds like very good work.  The back-compatibility of this
> > approach is great.  But we should also consider this in the broader
> > context of index-format flexibility.
> >
> > Three general approaches have been proposed.  They are not exclusive.
> >
> > 1. Make the index format extensible by adding user-implementable
> > reader and writer interfaces for postings.
> >
> > 2. Add a richer set of standard index formats, including things like
> > compressed fields, no-positions, per-position weights, etc.
> >
> > 3. Provide hooks for including arbitrary binary data.
> >
> > Your proposal is of type (3).  LUCENE-662 is a (1).  Approaches of
> > type (2) are most friendly to non-Java implementations, since the
> > semantics of each variation are well-defined.
> >
> > I don't see a reason not to pursue all three, but in a coordinated
> > manner.  In particular, we don't want to add a feature of type (3)
> > that would make it harder to add type (1) APIs.  It would thus be best
> > if we had a rough specification of type (1) and type (2).  A proposal
> > of type (2) is at:
> >
> > http://wiki.apache.org/jakarta-lucene/FlexibleIndexing
> >
> > But I'm not sure that we yet have any proposed designs for an
> > extensible posting API.  (Is anyone aware of one?)  This payload
> > proposal can probably be easily incorporated into such a design, but I
> > would have more confidence if we had one.  I guess I should attempt one!
>
> Doug,
>
> thanks for your detailed response. I'm aware that the long-term goal is
> the flexible index format and I see the payloads patch only as a part of
> it. The patch focuses on extending the index data structures and about a
> possible payload encoding. It doesn't focus yet on a flexible API, it
> only offers the two mentioned low-level methods to add and retrieve byte
> arrays.
>
> I would love to work with you guys on the flexible index format and to
> combine my patch with your suggestions and the patch from Nicolas! I
> will look at your proposal and Nicolas' patch tomorrow (have to go now).
> I just attached my patch (LUCENE-755), so if you get a chance you could
> take a look at it.

I have just looked at it. It looks great :)
But I still doesn't understand why a new entry in the fieldinfo is needed. 
There is the same for TermVector. And code like that fail for no obvious 
reason :

Document doc = new Document();
doc.add(new Field("f1", "v1", Store.YES, Index.TOKENIZED, 
TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field("f1", "v2", Store.YES, Index.TOKENIZED, TermVector.NO));

RAMDirectory ram = new RAMDirectory();
IndexWriter writer = new IndexWriter(ram, new StandardAnalyzer(), true);
writer.addDocument(doc);
writer.close();

Knowing a little bit about how lucene works, I have an idea why this fail, but 
can we avoid this ?

Nicolas

-- 
Nicolas LALEVÉE
Solutions & Technologies
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message