lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley @MITRE.org" <DSMI...@mitre.org>
Subject Re: XSLT transform before update?
Date Sun, 20 Apr 2008 18:37:16 GMT

Thanks Shalin.

The particular XSLT processor used is not relevant; it's a spec.  Just use
the standard Java APIs.  If I want a particular processor, then I can get
that to happen by using a system property and/or you could offer a
configuration input for the standard factory class implementation for a
processor of my choice.

~ David


Shalin Shekhar Mangar wrote:
> 
> Hi David,
> Actually you can concatenate values, however you'll have to write a bit of
> code. You can write this in javascript (if you're using Java 6) or in
> Java.
> 
> Basically, you need to write a Transformer to do it. Look at
> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
> 
> For example, lets say you get fields first-name and last-name in the XML.
> But in the schema.xml you have a field called "name" in which you need to
> concatenate the values of first-name and last-name (with a space in
> between). Create a Java class:
> 
> public class ConcatenateTransformer { public Object
> transformRow(Map<String,
> Object> row) { String firstName = row.get("first-name"); String lastName =
> row.get("last-name"); row.put("name", firstName + " " + lastName); return
> row; } }
> 
> Add this class to solr's classpath by putting its jar in solr/WEB-INF/lib
> 
> The data-config.xml should like this:
> <entity name="myEntity" processor="XPathEntityProcessor" url="
> http://myurl/example.xml"
> transformer="com.yourpackage.ConcatenateTransformer"> <field
> column="first-name" xpath="/record/first-name" /> <field
> column="last-name"
> xpath="/record/last-name" /> <field column="name" /> </entity>
> 
> This will call ConcatenateTransformer.transformRow method for each row and
> you can concatenate any field with any field (or constant). Note that solr
> document will keep only those fields which are in the schema.xml, the rest
> are thrown away.
> 
> If you don't want to write this in Java, you can use JavaScript by using
> the
> built-in ScriptTransformer, for an example look at
> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
> 
> However, I'm beginning to realize that XSLT is a common need, let me see
> how
> best we can accomodate it in DataImportHandler. Which XSLT processor will
> you prefer?
> 
> On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org
> <DSMILEY@mitre.org>
> wrote:
> 
>>
>> I'm in the same situation as you Daniel.  The DataImportHandler is pretty
>> awesome but I'd also prefer it had the power of XSLT.  The XPath support
>> in
>> it doesn't suffice for me.  And I can't do very basic things like
>> concatenate one value with another, say a constant even.  It's too bad
>> there
>> isn't a mode that XSLT can be put in to to not build the whole file into
>> memory to do the transform.  I've been looking into this and have turned
>> up
>> nothing.  It would be neat if there was a STaX to multi-document adapter,
>> at
>> which point XSLT could be applied to the smaller fixed-size documents
>> instead of the entire data stream.  I haven't found anything like this so
>> it'd need to be built.  For now my documents aren't too big to XSLT
>> in-memory.
>>
>> ~ David
>>
>>
>> Daniel Papasian wrote:
>> >
>> > Shalin Shekhar Mangar wrote:
>> >> Hi Daniel,
>> >>
>> >> Maybe if you can give us a sample of how your XML looks like, we can
>> >> suggest
>> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the
>> >> use-cases
>> >> we have yet encountered are solvable using the XPathEntityProcessor in
>> >> DataImportHandler without using XSLT, for details look at
>> >>
>> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
>> >
>> > I think even if it is possible to use SOLR-469 for my needs, I'd still
>> > prefer the XSLT approach, because it's going to be a bit of
>> > configuration either way, and I'd rather it be an XSLT stylesheet than
>> > solrconfig.xml.  In addition, I haven't yet decided whether I want to
>> > apply any patches to the version that we will deploy, but if I do go
>> > down the route of the XSLT transform patch, if I end up having to back
>> > it out the amount of work that it would be for me to do the transform
>> at
>> > the XML source would be negligible, where it would be quite a bit of
>> > work ahead of me to go from using the DataImportHandler to not using it
>> > at all.
>> >
>> > Because both the solr instance and the XML source are in house, I have
>> > the ability to apply the XSLT at the source instead of at solr.
>> > However, there are different teams of people that control the XML
>> source
>> > and solr, so it would require a bit more office coordination to do it
>> on
>> > the backend.
>> >
>> > The data is a filemaker XML export (DTD fmresultset) and it looks
>> > roughly like this:
>> > <fmresultset>
>> >    <resultset>
>> >      <field name="ID"><data>125</data></field>
>> >      <field name="organization"><data>Ford Foundation</data></field>
>> >      ...
>> >      <relatedset table="Employees">
>> >        <record>
>> >          <field name="ID"><data>Y5-A</data></field>
>> >          <field name="Name"><data>John Smith</data></field>
>> >        </record>
>> >        <record>
>> >          <field name="ID"><data>Y5-B</data></field>
>> >          <field name="Name"><data>Jane Doe</data></field>
>> >        </record>
>> >      </relatedset>
>> > </fmresultset>
>> >
>> > I'm taking the product of the resultset and the relatedset, using both
>> > IDs concatenated as a unique identifier, like so:
>> >
>> > <doc>
>> > <field name="ID">125Y5-A</field>
>> > <field name="organization">Ford Foundation</field>
>> > <field name="Name">John Smith</field>
>> > </doc>
>> > <doc>
>> > <field name="ID">125Y5-B</field>
>> > <field name="organization">Ford Foundation</field>
>> > <field name="Name">Jane Doe</field>
>> > </doc>
>> >
>> > I can do the transform pretty simply with XSLT.  I suppose it is
>> > possible to get the DataImportHandler to do this, but I'm not yet
>> > convinced that it's easier.
>> >
>> > Daniel
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: http://www.nabble.com/XSLT-transform-before-update--tp16738227p16796900.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message