lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Is Boilerpipe usable through Solr ExtractingUpdateHandler or the DIH?
Date Fri, 07 Sep 2012 08:39:57 GMT
Hi,

It should not be so hard but it looks like the current SolrContentHandler builds up the document
via SAX-events. You could pass a BoilerpipeContentHandler((ContentHandler)parsingHandler,
BoilerpipeExtractor) to the parser in ExtractingDocumentLoader.java. It should work.

Markus

 
 
-----Original message-----
> From:Lance Norskog <goksron@gmail.com>
> Sent: Thu 06-Sep-2012 05:51
> To: solr-user@lucene.apache.org
> Subject: Is Boilerpipe usable through Solr ExtractingUpdateHandler or the DIH?
> 
> Tika integrated Boilerpipe a few releases back. Is it possible to invoke it when using
the ExtractingUpdateHandler (simple Tika) or the DataImportHandler? 
> 
> http://code.google.com/p/boilerpipe/ 
> 
> 
> 

Mime
View raw message