lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karsten-s...@gmx.de
Subject DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1
Date Sat, 09 Apr 2011 12:32:01 GMT
Hi Folks,

does anyone improve DIH XPathRecordReader to deal with nested xpaths?
e.g.
data-config.xml with
 <entity .. processor="XPathEntityProcessor" ..
  <field column="title" xpath="//body/h1"/>
  <field column="alltext” xpath="//body" flatten="true"/>
and the XML stream contains
  /html/body/h1...
will only fill field “alltext” but field “title” will be empty.

This is a known issue from 2009
https://issues.apache.org/jira/browse/SOLR-1437#commentauthor_12756469_verbose

So three questions: 
1. How to fill a “search over all”-Field without nested xpaths? 
   (schema.xml  <copyField source="*" dest="alltext"/> will not help, because we lose
the original token order)
2. Does anyone try to improve XPathRecordReader to deal with nested xpaths?
3. Does anyone else need this feature?


Best regards
  Karsten

Mime
View raw message