lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fergus McMenemie (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (SOLR-1033) DIH transformers cannot reuse output from previous transformations
Date Mon, 23 Feb 2009 11:35:02 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675856#action_12675856
] 

fergus edited comment on SOLR-1033 at 2/23/09 3:33 AM:
-----------------------------------------------------------------

OK here goes. My document contains references to embeded imagery. For each image there is
the image itself along with a thumbnail and caption. The source document contains:-

  <mediaObject vurl="1043130" imageType="graphic"/>

I have a search application that searches only the captions associated with a given image.
It would be nice to populate solr fields with the correct relative path to each image and
thumbnails at index time. Problem arises in that although the thumbnail is:

   s${e.vurl}.jpg

The name of the image itself varies depending on the first letter of the image type imageType!
It could be one of 'picture' 'graphic' 'lineDrawing' or 'map'. ie:-

   p${e.vurl}.jpg
   g${e.vurl}.jpg
   l${e.vurl}.jpg
   m${e.vurl}.jpg

My patch would allow the following sort of thing to be added to a data-config. I feel this
considerably increases its power and usefulness.

{code}
<entity name="x" .... transformer="TemplateTransformer,RegexTransformer">
  <field column="fileWebPath"            template="${jc.fileAbsolutePath}" regex="${dataimporter.request.contentdir}(.*)"
replaceWith="/ford$1" />
  <field column="vurl"                          xpath="/record/mediaBlock/mediaObject/@vurl"
/>
  <field column="imagetype"               xpath="/record/mediaBlock/mediaObject/@imageType"
regex="^(\w).*"/>
  <field column="imgWebPathICON"  regex="(.*)/.*" replaceWith="$1/imagery/s${x.vurl}.jpg"
sourceColName="fileWebPath"/>
  <field column="imgWebPathFULL"  regex="(.*)/.*" replaceWith="$1/imagery/${x.imagetype}${x.vurl}.jpg"
 sourceColName="fileWebPath"/>
{code}


      was (Author: fergus):
    OK here goes. My document contains references to embeded imagery. For each image there
is the image itself along with a thumbnail and caption. The source document contains:-

  <mediaObject vurl="1043130" imageType="graphic"/>

I have a search application that searches only the captions associated with a given image.
It would be nice to populate solr fields with the correct relative path to each image and
thumbnails at index time. Problem arises in that although the thumbnail is:

   s${e.vurl}.jpg

The name of the image itself varies depending on the first letter of the image type imageType!
It could be one of 'picture' 'graphic' 'lineDrawing' or 'map'. ie:-

   p${e.vurl}.jpg
   g${e.vurl}.jpg
   l${e.vurl}.jpg
   m${e.vurl}.jpg

My patch would allow the following sort of thing to be added to a data-config. I feel this
considerably increases its power and usefulness.

{{code}}
<entity name="x" .... transformer="TemplateTransformer,RegexTransformer">
  <field column="fileWebPath"            template="${jc.fileAbsolutePath}" regex="${dataimporter.request.contentdir}(.*)"
replaceWith="/ford$1" />
  <field column="vurl"                          xpath="/record/mediaBlock/mediaObject/@vurl"
/>
  <field column="imagetype"               xpath="/record/mediaBlock/mediaObject/@imageType"
regex="^(\w).*"/>
  <field column="imgWebPathICON"  regex="(.*)/.*" replaceWith="$1/imagery/s${x.vurl}.jpg"
sourceColName="fileWebPath"/>
  <field column="imgWebPathFULL"  regex="(.*)/.*" replaceWith="$1/imagery/${x.imagetype}${x.vurl}.jpg"
 sourceColName="fileWebPath"/>
{{code}}

  
> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and
or regex transformers. Currently this cannot be done. The resolver is initialized at the start
of the transformer run with what ever values exist for a column name at that instant. As the
transformer executes it may define new values for column names. My change is intended to update
the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message