lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1001) using invariant request values from solrconfig.xml inside a data-config.xml regexp
Date Fri, 06 Feb 2009 04:09:59 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671000#action_12671000
] 

Noble Paul commented on SOLR-1001:
----------------------------------

bq.Would it be alright if we resolve all entity attributes in ContextImpl.getEntityAttribute?

So the components will never be able to get the actual string if they wish to. Moreover it
is not backcompat. We can add extra method
# ContextImpl.getResolvedEntityAttribute
# ContextImpl.getAllResolvedEntityFields .It is expensive to do that in here because the component
may be interested in only one variable and we end up resolving all variables . If we can cache
it it may be ok.

> using invariant request values from solrconfig.xml inside a data-config.xml regexp
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-1001
>                 URL: https://issues.apache.org/jira/browse/SOLR-1001
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As per several postings I noted that I can define variables inside an invariants list
section of the DIH handler of solrconfig.xml. I can also reference these variables within
data-config.xml. This works properly, the solr field "test" is nicely populated. However it
is not substituted into my regex transformer? Here is my  data-config.xml which gives a hint
of the use case.
>    <dataConfig>
>    <dataSource name="myfilereader" type="FileDataSource"/>    
>     <document>
>        <entity name="jc"
> 	       processor="FileListEntityProcessor"
> 	       fileName="^.*\.xml$"
> 	       newerThan="'NOW-1000DAYS'"
> 	       recursive="true"
> 	       rootEntity="false"
> 	       dataSource="null"
> 	       baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
> 	  <entity name="x"
> 	          dataSource="myfilereader"
> 		  processor="XPathEntityProcessor"
> 		  url="${jc.fileAbsolutePath}"
> 		  stream="false"
> 		  forEach="/record"
> 		  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>    <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>    <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)"
replaceWith="$1" sourceColName="fileAbsolutePath"/>
>    <field column="test"             template="${dataimporter.request.finstalldir}"
/>
>    <field column="title"            xpath="/record/title" />
>    <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
>    <field column="date"             xpath="/record/metadata/date[@qualifier='Date']"
dateTimeFormat="yyyyMMdd"   />
>    	     </entity>
>        </entity>
>        </document>
>     </dataConfig>
> Shalin has pointed out that we are creating the regex Pattern without first resolving
the variable. So we need to call VariableResolver.resolve on the 'regex' attribute's value
before creating the Pattern object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message