lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Burlison <alan.burli...@gmail.com>
Subject Re: Position increment clarification?
Date Sun, 15 Sep 2013 11:01:46 GMT
On 15/09/13 11:41, Michael McCandless wrote:

> Your understanding is correct: there are two ways to affect the
> indexed position.

Thanks for the confirmation, took me a while to figure that out :-)

> Either approach would work, but if you do the single-field approach,
> the challenge is in making a TokenFilter that knows when one chunk
> ended so it could set the position increment.

Yes, I'd have to find a way to pass some metadata into the tokenizer 
before feeding it each chunk. Kinda messy.

> I think it'd be easier to just add multiple field instances?

Yes, that's the conclusion I came to. It's easy enough to do, I'm using 
JavaMail to recursively traverse the mail file so I can separate out 
each mail and also deal with multipart mails as well as attachments, 
which I'm then feeding into Tika.

Thank you for the information :-)

-- 
Alan Burlison
--

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message