lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception
Date Sat, 28 Sep 2013 00:43:54 GMT
This is a rather complicated example to chew through, but try the following
two things:
*) dataField="${tika.text}"  => dataField="text" (or less likely htmlMapper
tika.text)
You might be trying to read content of the field rather than passing
reference to the field that seems to be expected. This might explain the
exception.

*) It may help to be aware of
https://issues.apache.org/jira/browse/SOLR-4530 . There is a new
htmlMapper="identity" flag on Tika entries to ensure more of HTML structure
passing through. By default, Tika strips out most of the HTML tags.

Regards,
   Alex.

On Thu, Sep 26, 2013 at 5:17 PM, Andreas Owen <ao@conx.ch> wrote:

>                 <entity name="tika" processor="TikaEntityProcessor"
> url="${rec.urlParse}" dataSource="dataUrl" onError="skip" format="html">
>                         <field column="text"/>
>
>                         <entity name="detail" type="XPathEntityProcessor"
> forEach="/html" dataSource="fld" dataField="${tika.text}" rootEntity="true"
> onError="skip">
>                                 <field xpath="//h1" column="h_1" />
>                         </entity>
>                 </entity>
>



Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message