lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Solr Cell revamped as an UpdateProcessor?
Date Mon, 07 Dec 2009 20:51:44 GMT

ASs someone with very little knowledge of Solr Cell and/or Tika, I find 
myself wondering if ExtractingRequestHandler would make more sense as an 
extractingUpdateProcessor -- where it could be configured to take take 
either binary fields (or string fields containing URLs) out of the 
Documents, parse them with tika, and add the various XPath matching hunks 
of text back into the document as new fields.

Then ExtractingRequestHandler just becomes a handler that slurps up it's 
ContentStreams and adds them as binary data fields and adds the other 
literal params as fields.

Wouldn't that make things like SOLR-1358, and using Tika with 
URLs/filepaths in XML and CSV based updates fairly trivial?



-Hoss


Mime
View raw message