lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Fields with the same name?? - Was Re: Payloads and tokenizers
Date Mon, 18 Aug 2008 04:41:19 GMT
> I assume you already know this but just to make sure what I meant was clear
> - on tokenization but still indexing just means that the entire field's text
> becomes a single unchanged token. I believe this is exactly what
> SingleTokenTokenStream can buy you - a single token, for which you can pre
> set a payload.

Yes, I was with you :)


> It is.  Field maintains its  value and it is either string/stream/etc. Once
> you set it to tokenStream the string value is lost and there's no way to
> store it.

Thanks for that - I delved a little further into FieldsWriter and see what you 
mean.


> How about adding this field in two parts, one part for indexing with the
> payload and the other part for storing, i.e. something like this:
> 
>     Token token = new Token(...);
>     token.setPayload(...);
>     SingleTokenTokenStream ts = new SingleTokenTokenStream(token);
> 
>     Field f1 = new Field("f","some-stored-content",Store.YES,Index.NO);
>     Field f2 = new Field("f", ts);

Now that got me thinking and I have exposed a rather large misconception in my 
understanding of the Lucene internals when consider fields of the same name.

Your idea above looked like a good one.  However, I realise I am probably trying 
to use payloads wrongly.  I have the following information to store for a single 
Document

contentId - 1 instance
ownerId 1..n instances
accessId 1..n instances

One ownerId has a corresponding accessId for the contentId.

My search criteria are ownerId:XXX + user criteria.  When there is a hit, I need 
the contentId and the corresponding accessId (for the owner) back.  So, I wanted 
to store the accessId as a payload to the ownerId.

This is where I came unstuck.  For 'n=3' above, I used the 
SingleTokenTokenStream as you suggested with the accessId as the payload for 
ownerId.  However, at the Document level, I cannot get the payloads from the 
field so, in trying to understand fields with the same name, I discovered that 
there is a big difference between

(a)
Field f = new Field("ownerId", "OID1", Store.YES, Index.NO_NORMS);
f = new Field("ownerId", "OID2", Store.YES, Index.NO_NORMS);
f = new Field("ownerId", "OID3", Store.YES, Index.NO_NORMS);

and (b)
Field f = new Field("ownerId", "OID1 OID2 OID3", Store.YES, Index.NO_NORMS);

as Document.getFields("ownerId") for (a) will be 3 and for (b) it will be 1.

My question then is, if I do

for (int i = 0; i < owners; i++)
{
     f = new Field("ownerId", oid[i], Store.YES, Index.NO_NORMS);
     doc.add(f);
     f = new Field("accessId", aid[i], Store.YES, Index.NO_NORMS);
     doc.add(f);
}

then will the array elements for the corresponding Field arrays returned by

Document.getFields("ownerId")
Document.getFields("accessId")

**guarantee** that the array element order is the same as the order they were added?

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message