lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chantal Ackermann <>
Subject Re: Store complete XML record (DIH & XPathEntityProcessor)
Date Thu, 28 Jul 2011 11:10:46 GMT

Hi g,

have a look at the PlainTextEntityProcessor:

you will have to call the URL twice that way, but I don't think you can
get the complete document (the root element with all structure) via
xpath - so the XPathEntityProcessor cannot help you.

If calling the URL twice slows your indexer down in unacceptable ways
you can always subclass XPathEntityProcessor (knowing Java is helpful,
thoug...). There surely is a way to make it return what you need. Or
maybe an entity processor that caches the content and uses XPath EP and
PlainText EP to accomplish your needs (not sure whether the API allows
for that).


On Thu, 2011-07-28 at 05:53 +0200, solruser@9913 wrote:
> I am trying to use DIH to import an XML based file with multiple XML records
> in it.  Each record corresponds to one document in Lucene.  I am using the
> DIH FileListEntityProcessor (to get file list) followed by the
> XPathEntityProcessor to create the entities.  
> It works perfectly and I am able to map XML elements to fields ..... however
> I also need to store the entire XML record as separate 'full text' field. 
> Is there any way the XPathEntityProcessor provides a variable like 'rawLine'
> or 'plainText' that I can map to a field.  
> I tried to use the Plain Text processor after this  - but that does not
> recognize the XML boundaries and just gives the whole XML file.
>        <entity name="x" rootEntity="true"    dataSource="logfilereader"
>                processor="XPathEntityProcessor"
>                url="${logfile.fileAbsolutePath}"  stream="false"
> forEach="/xml/myrecord"
>                transformer="...."  " >
>                  <field column="mycol1" 				xpath="/xml/myrecord/@something"
> />
> and so on ...
> This works perfectly.  However I also need something like ...
> 		<field column="fullxmlrecord"     name="plainText"  />
> Any help is much appreciated. I am a newbie and may be missing something
> obvious here
> -g
> --
> View this message in context:
> Sent from the Solr - User mailing list archive at

View raw message