tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <michael.wech...@wyona.com>
Subject Re: Customzing TikaConfig or rather getParser
Date Thu, 04 Sep 2008 11:50:36 GMT
Jukka Zitting schrieb:
> Hi,
> On Thu, Sep 4, 2008 at 11:31 AM, Michael Wechner
> <michael.wechner@wyona.com> wrote:
>> this seems to work for our usecase, but it seems to me that the actual
>> problem is just transfered one step further down.
> "There are few problems in computer science that can not be solved by
> adding another level of indirection." -Tom Christansen
>> I think it would be better to separate the parser actual selection (via
>> chain of responsibility) from passing in metadata.
> The way I see it, an application should ideally only deal with a
> single Parser instance, that would be smart enough to select the
> appropriate parsing mechanism for each incoming document based on the
> associated metadata.

I am afraid that this makes the parsers less usable, but of course we 
could introduce a meta-parser and then re-use the actual data parsers.
But then again one might have to ask why handle mime-type exceptionally ;-)
> The reason for making the Metadata object a modifiable input/output
> parameter (instead of just a return value) of the parse() method was
> that a client application could feed extra metadata to the parsing
> process. In your use case that extra metadata would be the path of the
> document.

this is how we are now using it.


> BR,
> Jukka Zitting

View raw message