tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: Customzing TikaConfig or rather getParser
Date Thu, 04 Sep 2008 11:21:44 GMT
Hi,

On Thu, Sep 4, 2008 at 11:31 AM, Michael Wechner
<michael.wechner@wyona.com> wrote:
> this seems to work for our usecase, but it seems to me that the actual
> problem is just transfered one step further down.

"There are few problems in computer science that can not be solved by
adding another level of indirection." -Tom Christansen

> I think it would be better to separate the parser actual selection (via
> chain of responsibility) from passing in metadata.

The way I see it, an application should ideally only deal with a
single Parser instance, that would be smart enough to select the
appropriate parsing mechanism for each incoming document based on the
associated metadata.

The reason for making the Metadata object a modifiable input/output
parameter (instead of just a return value) of the parse() method was
that a client application could feed extra metadata to the parsing
process. In your use case that extra metadata would be the path of the
document.

BR,

Jukka Zitting

Mime
View raw message