lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <erik.hatc...@gmail.com>
Subject Re: SOLR DataImportHandler - Problem with XPathEntityProcessor
Date Wed, 09 Sep 2015 10:06:27 GMT
If you need additional manipulation during the update process, you can use the an update processor
- there’s a script update processor that you can use to JavaScript additional document processing.
  See http://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
<http://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html>
for some additional information.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com <http://www.lucidworks.com/>




> On Sep 8, 2015, at 12:12 PM, Umang Agrawal <umang.in60@gmail.com> wrote:
> 
> Thanks Alex.
> 
> Inner entity name should be different - It was a typo error in my question.
> 
> Regarding using XsltUpdateRequestHandler <https://wiki.apache.org/solr/XsltUpdateRequestHandler>
, It's a good solution but I can not use it in my application since I need to include few
more transformer and java manipulators.
> 
> Could you please suggest how to use XPATH syntax like "/RESOURCE/LINK[@ID=${testdata.id
<http://testdata.id/>}]/TAG/TAG_VALUE" in data config xml file?
> 
> On Tue, Sep 8, 2015 at 6:34 PM, Umang Agrawal <umang.in60@gmail.com <mailto:umang.in60@gmail.com>>
wrote:
> Hi All
> 
> I am facing a problem with XPathEntityProcessor . 
> 
> Objective:
> When I index Resource XML file using DIH XPathEntityProcessor then there should be 2
solr documents 
> 01) Link where id is 1000 with 2 tags ABC and DEF
> 02) Link where id is 2000 with 3 tags GHI, JKL and MNO
> 
> Solr Version: 4.10.2
> 
> Problem:
> I am not able to index <TAG/> data properly. 
> 
> Expected Output:
> 	{
> 		"id": "1000",
> 		"field_name": "val1",
> 		"ABC": "ABC_VALUE",
> 		"DEF": "DEF_VALUE"
> 	},
> 	{
> 		"id": "2000",
> 		"field_name": "val2",
> 		"GHI": "GHI_VALUE",
> 		"JKL": "JKL_VALUE",
> 		"MNO": "MNO_VALUE"
> 	}
> ========================================================================================================
> 
> Resource XML:
> 
> <RESOURCE>
> 	<LINK ID="1000">
> 		<FIELD>val1</FIELD>
> 		<TAG>
> 			<TAG_CODE>ABC</TAG_CODE>
> 			<TAG_VALUE>ABC_VALUE</TAG_VALUE>
> 		</TAG>
> 		<TAG>
> 			<TAG_CODE>DEF</TAG_CODE>
> 			<TAG_VALUE>DEF_VALUE</TAG_VALUE>
> 		</TAG>
> 	</LINK>
> 	<LINK ID="2000">
> 		<FIELD>val2</FIELD>
> 		<TAG>
> 			<TAG_CODE>GHI</TAG_CODE>
> 			<TAG_VALUE>GHI_VALUE</TAG_VALUE>
> 		</TAG>
> 		<TAG>
> 			<TAG_CODE>JKL</TAG_CODE>
> 			<TAG_VALUE>JKL_VALUE</TAG_VALUE>
> 		</TAG>
> 		<TAG>
> 			<TAG_CODE>MNO</TAG_CODE>
> 			<TAG_VALUE>MNO_VALUE</TAG_VALUE>
> 		</TAG>
> 	</LINK>	
> </RESOURCE>
> 
> ========================================================================================================
> 
> DataConfig XML (TRY 1):
> <dataConfig>
> 	<script><![CDATA[
> 		function f1(row) {
> 			var code = row.get("TAG_CODE");
> 			var val = row.get("TAG_VALUE");
> 			
> 			row.put(code, val);
> 			row.remove("TAG_CODE");
> 			row.remove("TAG_VALUE");
> 			return row;
> 		}
>     ]]></script>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="testdata" url="http://host:port/uri"
>                 processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> 			<field column="id" xpath="/RESOURCE/LINK/@ID" />	
>             <field column="field_name" xpath="/RESOURCE/LINK/FIELD" />
> 			<entity name="testdata" url="http://host:port/uri"
>                 processor="XPathEntityProcessor" forEach="/RESOURCE/LINK/TAG" transformer="script:f1">
> 				<field column="TAG_CODE" xpath="/RESOURCE/LINK/TAG/TAG_CODE" />
> 				<field column="TAG_VALUE" xpath="/RESOURCE/LINK/TAG/TAG_VALUE" />
> 			</entity>
>         </entity>
>     </document>
> </dataConfig>
> 
> Output:
> 	{
> 		"id": "1000",
> 		"field_name": "val1",
> 		"ABC": "ABC_VALUE",
> 		"DEF": "DEF_VALUE",
> 		"GHI": "GHI_VALUE",
> 		"JKL": "JKL_VALUE",
> 		"MNO": "MNO_VALUE"
> 	},
> 	{
> 		"id": "2000",
> 		"field_name": "val2",
> 		"ABC": "ABC_VALUE",
> 		"DEF": "DEF_VALUE",
> 		"GHI": "GHI_VALUE",
> 		"JKL": "JKL_VALUE",
> 		"MNO": "MNO_VALUE"
> 	}
> 
> ========================================================================================================
> 
> DataConfig XML (TRY 2):
> <dataConfig>
> 	<script><![CDATA[
> 		function f1(row) {
> 			var code = row.get("TAG_CODE");
> 			var val = row.get("TAG_VALUE");
> 			
> 			row.put(code, val);
> 			row.remove("TAG_CODE");
> 			row.remove("TAG_VALUE");
> 			return row;
> 		}
>     ]]></script>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="testdata" url="http://host:port/uri"
>                 processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> 			<field column="id" xpath="/RESOURCE/LINK/@ID" />	
>             <field column="field_name" xpath="/RESOURCE/LINK/FIELD" />
> 			<entity name="testdata" url="http://host:port/uri"
>                 processor="XPathEntityProcessor" forEach="/RESOURCE/LINK[@ID=${testdata.id
<http://testdata.id/>}]/TAG" transformer="script:f1">
> 				<field column="TAG_CODE" xpath="/RESOURCE/LINK/TAG/TAG_CODE" />
> 				<field column="TAG_VALUE" xpath="/RESOURCE/LINK/TAG/TAG_VALUE" />
> 			</entity>
>         </entity>
>     </document>
> </dataConfig>
> 
> Output:
> 	{
> 		"id": "1000",
> 		"field_name": "val1"		
> 	},
> 	{
> 		"id": "2000",
> 		"field_name": "val2"		
> 	}
> 
> ========================================================================================================
> 
> DataConfig XML (TRY 3):
> <dataConfig>
> 	<script><![CDATA[
> 		function f1(row) {
> 			var code = row.get("TAG_CODE");
> 			var val = row.get("TAG_VALUE");
> 			
> 			row.put(code, val);
> 			row.remove("TAG_CODE");
> 			row.remove("TAG_VALUE");
> 			return row;
> 		}
>     ]]></script>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="testdata" url="http://host:port/uri"
>                 processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> 			<field column="id" xpath="/RESOURCE/LINK/@ID" />	
>             <field column="field_name" xpath="/RESOURCE/LINK/FIELD" />
> 			<entity name="testdata" url="http://host:port/uri"
>                 processor="XPathEntityProcessor" forEach="/RESOURCE/LINK[@ID=${testdata.id
<http://testdata.id/>}]/TAG" transformer="script:f1">
> 				<field column="TAG_CODE" xpath="/RESOURCE/LINK[@ID=${testdata.id <http://testdata.id/>}]/TAG/TAG_CODE"
/>
> 				<field column="TAG_VALUE" xpath="/RESOURCE/LINK[@ID=${testdata.id <http://testdata.id/>}]/TAG/TAG_VALUE"
/>
> 			</entity>
>         </entity>
>     </document>
> </dataConfig>
> 
> Output:
> 	{
> 		"id": "1000",
> 		"field_name": "val1"		
> 	},
> 	{
> 		"id": "2000",
> 		"field_name": "val2"		
> 	}
> 
> 
> -- 
> Thanx & Regards
> Umang Agrawal
> 
> 
> 
> 
> 
> 
> -- 
> Thanx & Regards
> Umang Agrawal


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message