tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <michael.wech...@wyona.com>
Subject Re: Customzing TikaConfig or rather getParser
Date Thu, 04 Sep 2008 09:31:10 GMT
Jukka Zitting schrieb:
> Hi,
>
> On Mon, Aug 25, 2008 at 9:06 AM, Michael Wechner
> <michael.wechner@wyona.com> wrote:
>   
>> I think this is where the problem is, I mean the getParser(String) method.
>>
>> I would like to overwrite this method by implementing my own chain of
>> responsibility.
>>     
>
> How about the following:
>
>     public class MyCustomParser extends CompositeParser {
>
>         public MyCustomParser throws TikaException {
>             setConfig(TikaConfig.getDefaultConfig());
>             // or whatever config you want
>         }
>
>         protected Parser getParser(Metadata metadata) {
>             // Custom code to select an appropriate parser
>             // based on the input metadata (mime type,
>             // document path, whatever) passed by the client.
>             // Or fallback to:
>             return super.getParser(metadata);
>         }
>
>     }
>
> Your client code would then look like:
>
>     private Parser parser = new MyCustomParser();
>
>     Metadata metadata = new Metadata();
>     metadata.set(Metadata.CONTENT_TYPE);
>     // plus whatever other metadata you need in MyCustomParser
>
>     parser.parse(stream, handler, metadata);
>
> One of my design goals for the current Parser interface was was that
> you can encapsulate this sort of functionality inside it.
>   

this seems to work for our usecase, but it seems to me that the actual 
problem is just transfered one step further down.

I think it would be better to separate the parser actual selection (via 
chain of responsibility) from passing in metadata.

Cheers

Michael
> BR,
>
> Jukka Zitting
>   


Mime
View raw message