lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Use of tika for parsing, offsets questions
Date Thu, 03 Sep 2009 13:07:18 GMT
Hi,

On Wed, Sep 2, 2009 at 2:40 PM, David Causse<dcausse@spotter.com> wrote:
> If I use tika for parsing HTML code and inject parsed String to a lucene
> analyzer. What about the offset information for KWIC and return to text
> (like the google cache view)? how can I keep track of the offsets
> between tika parser and lucene analyzer?

Currently Tika doesn't expose that information but the Tika Parser API
was designed for such use in mind, so it will be possible to add the
offset information. Please file a Tika feature request [1] for this.

[1] https://issues.apache.org/jira/browse/TIKA

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message