lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <>
Subject Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception
Date Sat, 28 Sep 2013 00:43:54 GMT
This is a rather complicated example to chew through, but try the following
two things:
*) dataField="${tika.text}"  => dataField="text" (or less likely htmlMapper
You might be trying to read content of the field rather than passing
reference to the field that seems to be expected. This might explain the

*) It may help to be aware of . There is a new
htmlMapper="identity" flag on Tika entries to ensure more of HTML structure
passing through. By default, Tika strips out most of the HTML tags.


On Thu, Sep 26, 2013 at 5:17 PM, Andreas Owen <> wrote:

>                 <entity name="tika" processor="TikaEntityProcessor"
> url="${rec.urlParse}" dataSource="dataUrl" onError="skip" format="html">
>                         <field column="text"/>
>                         <entity name="detail" type="XPathEntityProcessor"
> forEach="/html" dataSource="fld" dataField="${tika.text}" rootEntity="true"
> onError="skip">
>                                 <field xpath="//h1" column="h_1" />
>                         </entity>
>                 </entity>

Personal website:
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message