lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Is Boilerpipe usable through Solr ExtractingUpdateHandler or the DIH?
Date Fri, 07 Sep 2012 10:05:40 GMT
It works indeed:
https://issues.apache.org/jira/browse/SOLR-3808
 
 
-----Original message-----
> From:Markus Jelsma <markus.jelsma@openindex.io>
> Sent: Fri 07-Sep-2012 10:40
> To: solr-user@lucene.apache.org
> Subject: RE: Is Boilerpipe usable through Solr ExtractingUpdateHandler or the DIH?
> 
> Hi,
> 
> It should not be so hard but it looks like the current SolrContentHandler builds up the
document via SAX-events. You could pass a BoilerpipeContentHandler((ContentHandler)parsingHandler,
BoilerpipeExtractor) to the parser in ExtractingDocumentLoader.java. It should work.
> 
> Markus
> 
>  
>  
> -----Original message-----
> > From:Lance Norskog <goksron@gmail.com>
> > Sent: Thu 06-Sep-2012 05:51
> > To: solr-user@lucene.apache.org
> > Subject: Is Boilerpipe usable through Solr ExtractingUpdateHandler or the DIH?
> > 
> > Tika integrated Boilerpipe a few releases back. Is it possible to invoke it when
using the ExtractingUpdateHandler (simple Tika) or the DataImportHandler? 
> > 
> > http://code.google.com/p/boilerpipe/ 
> > 
> > 
> > 
> 

Mime
View raw message