lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul നോബിള്‍ नोब्ळ्" <noble.p...@gmail.com>
Subject Re: XSLT transform before update?
Date Tue, 22 Apr 2008 11:16:05 GMT
hi ,

There is this new patch which implements these features. I shall
update the wiki with the documentation

I guess we do not need to be too worried about the memory consumption.
A few MB of memory should be fine (unless your are using a file which
is in 10's of MB ). Consider using XPathEntityProcessor (if possible )
it uses Stax and it is pretty efficient.
thanks for your support

--Noble

A few MB of memory for an xml must be fine. The XPathEnt

On Mon, Apr 21, 2008 at 5:57 PM, David Smiley @MITRE.org
<DSMILEY@mitre.org> wrote:
>
>  Cool.  So you're saying that this xslt file will operate on the entire XML
>  document that was fetched from the URL and just pass it on to solr?  Thanks
>  for supporting this.  The XML files I have coming from the my data source
>  are big but not not too big to risk an out-of-memory error.  And I've found
>  xslt to perform fast for me.  I like your proposed TemplateTransformer
>  too... I'm tempted to use that in place of XSLT.  Great job Paul.
>
>  It'd be neat to have an XSLT transformer for your framework that operates on
>  a single entity (that addresses the memory usage problem).  I know your
>  entities are HashMap based instead of XML, however.
>
>  ~ David
>
>
>
>
>  Noble Paul നോബിള്‍ नोब्ळ् wrote:
>  >
>  > We are planning to incorporate both your requests in the next patch.
>  > The implementation is going to be as follows.mention the xsl file
>  > location as follows
>  > <entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl">
>  > ....
>  > </entity>
>  > So the processing will be done after the XSL transformation. If after
>  > your XSL transformation it produces a valid 'add' document not even
>  > fields is necessary. Otherwise you will need to write all the fields
>  > and their xpaths like any other xml
>  >
>  > <entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl"
>  > useSolrAddXml="true"/>
>  >
>  > So it will assume that the schema is same as that of the add xml and
>  > does the needful.
>  >
>  > Another feature is going to be a TemplateTransformer  which takes in a
>  > Template as follows
>  >
>  > <entity name="e" transformer="TemplateTransformer" ....>
>  > <field column="field1_2"  template="${e.field1} ${e.field2}/>
>  > </entity>
>  >
>  > Please let us know what u think about this.
>  >
>  > And keep giving us these great use-cases so that we can make the tool
>  > better.
>  > --Noble
>  >
>  >
>  >
>  > On Mon, Apr 21, 2008 at 12:07 AM, David Smiley @MITRE.org
>  > <DSMILEY@mitre.org> wrote:
>  >>
>  >>  Thanks Shalin.
>  >>
>  >>  The particular XSLT processor used is not relevant; it's a spec.  Just
>  >> use
>  >>  the standard Java APIs.  If I want a particular processor, then I can
>  >> get
>  >>  that to happen by using a system property and/or you could offer a
>  >>  configuration input for the standard factory class implementation for a
>  >>  processor of my choice.
>  >>
>  >>  ~ David
>  >>
>  >>
>  >>
>  >>
>  >>  Shalin Shekhar Mangar wrote:
>  >>  >
>  >>  > Hi David,
>  >>  > Actually you can concatenate values, however you'll have to write a
>  >> bit of
>  >>  > code. You can write this in javascript (if you're using Java 6) or in
>  >>  > Java.
>  >>  >
>  >>  > Basically, you need to write a Transformer to do it. Look at
>  >>  >
>  >> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
>  >>  >
>  >>  > For example, lets say you get fields first-name and last-name in the
>  >> XML.
>  >>  > But in the schema.xml you have a field called "name" in which you need
>  >> to
>  >>  > concatenate the values of first-name and last-name (with a space in
>  >>  > between). Create a Java class:
>  >>  >
>  >>  > public class ConcatenateTransformer { public Object
>  >>  > transformRow(Map<String,
>  >>  > Object> row) { String firstName = row.get("first-name"); String
>  >> lastName =
>  >>  > row.get("last-name"); row.put("name", firstName + " " + lastName);
>  >> return
>  >>  > row; } }
>  >>  >
>  >>  > Add this class to solr's classpath by putting its jar in
>  >> solr/WEB-INF/lib
>  >>  >
>  >>  > The data-config.xml should like this:
>  >>  > <entity name="myEntity" processor="XPathEntityProcessor" url="
>  >>  > http://myurl/example.xml"
>  >>  > transformer="com.yourpackage.ConcatenateTransformer"> <field
>  >>  > column="first-name" xpath="/record/first-name" /> <field
>  >>  > column="last-name"
>  >>  > xpath="/record/last-name" /> <field column="name" /> </entity>
>  >>  >
>  >>  > This will call ConcatenateTransformer.transformRow method for each row
>  >> and
>  >>  > you can concatenate any field with any field (or constant). Note that
>  >> solr
>  >>  > document will keep only those fields which are in the schema.xml, the
>  >> rest
>  >>  > are thrown away.
>  >>  >
>  >>  > If you don't want to write this in Java, you can use JavaScript by
>  >> using
>  >>  > the
>  >>  > built-in ScriptTransformer, for an example look at
>  >>  >
>  >> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
>  >>  >
>  >>  > However, I'm beginning to realize that XSLT is a common need, let me
>  >> see
>  >>  > how
>  >>  > best we can accomodate it in DataImportHandler. Which XSLT processor
>  >> will
>  >>  > you prefer?
>  >>  >
>  >>  > On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org
>  >>  > <DSMILEY@mitre.org>
>  >>  > wrote:
>  >>  >
>  >>  >>
>  >>  >> I'm in the same situation as you Daniel.  The DataImportHandler is
>  >> pretty
>  >>  >> awesome but I'd also prefer it had the power of XSLT.  The XPath
>  >> support
>  >>  >> in
>  >>  >> it doesn't suffice for me.  And I can't do very basic things like
>  >>  >> concatenate one value with another, say a constant even.  It's too
>  >> bad
>  >>  >> there
>  >>  >> isn't a mode that XSLT can be put in to to not build the whole file
>  >> into
>  >>  >> memory to do the transform.  I've been looking into this and have
>  >> turned
>  >>  >> up
>  >>  >> nothing.  It would be neat if there was a STaX to multi-document
>  >> adapter,
>  >>  >> at
>  >>  >> which point XSLT could be applied to the smaller fixed-size documents
>  >>  >> instead of the entire data stream.  I haven't found anything like
>  >> this so
>  >>  >> it'd need to be built.  For now my documents aren't too big to XSLT
>  >>  >> in-memory.
>  >>  >>
>  >>  >> ~ David
>  >>  >>
>  >>  >>
>  >>  >> Daniel Papasian wrote:
>  >>  >> >
>  >>  >> > Shalin Shekhar Mangar wrote:
>  >>  >> >> Hi Daniel,
>  >>  >> >>
>  >>  >> >> Maybe if you can give us a sample of how your XML looks
like, we
>  >> can
>  >>  >> >> suggest
>  >>  >> >> how to use SOLR-469 (Data Import Handler) to index it. Most
of the
>  >>  >> >> use-cases
>  >>  >> >> we have yet encountered are solvable using the
>  >> XPathEntityProcessor in
>  >>  >> >> DataImportHandler without using XSLT, for details look at
>  >>  >> >>
>  >>  >>
>  >> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
>  >>  >> >
>  >>  >> > I think even if it is possible to use SOLR-469 for my needs,
I'd
>  >> still
>  >>  >> > prefer the XSLT approach, because it's going to be a bit of
>  >>  >> > configuration either way, and I'd rather it be an XSLT stylesheet
>  >> than
>  >>  >> > solrconfig.xml.  In addition, I haven't yet decided whether
I want
>  >> to
>  >>  >> > apply any patches to the version that we will deploy, but if
I do
>  >> go
>  >>  >> > down the route of the XSLT transform patch, if I end up having
to
>  >> back
>  >>  >> > it out the amount of work that it would be for me to do the
>  >> transform
>  >>  >> at
>  >>  >> > the XML source would be negligible, where it would be quite
a bit
>  >> of
>  >>  >> > work ahead of me to go from using the DataImportHandler to not
>  >> using it
>  >>  >> > at all.
>  >>  >> >
>  >>  >> > Because both the solr instance and the XML source are in house,
I
>  >> have
>  >>  >> > the ability to apply the XSLT at the source instead of at solr.
>  >>  >> > However, there are different teams of people that control the
XML
>  >>  >> source
>  >>  >> > and solr, so it would require a bit more office coordination
to do
>  >> it
>  >>  >> on
>  >>  >> > the backend.
>  >>  >> >
>  >>  >> > The data is a filemaker XML export (DTD fmresultset) and it
looks
>  >>  >> > roughly like this:
>  >>  >> > <fmresultset>
>  >>  >> >    <resultset>
>  >>  >> >      <field name="ID"><data>125</data></field>
>  >>  >> >      <field name="organization"><data>Ford
>  >> Foundation</data></field>
>  >>  >> >      ...
>  >>  >> >      <relatedset table="Employees">
>  >>  >> >        <record>
>  >>  >> >          <field name="ID"><data>Y5-A</data></field>
>  >>  >> >          <field name="Name"><data>John Smith</data></field>
>  >>  >> >        </record>
>  >>  >> >        <record>
>  >>  >> >          <field name="ID"><data>Y5-B</data></field>
>  >>  >> >          <field name="Name"><data>Jane Doe</data></field>
>  >>  >> >        </record>
>  >>  >> >      </relatedset>
>  >>  >> > </fmresultset>
>  >>  >> >
>  >>  >> > I'm taking the product of the resultset and the relatedset,
using
>  >> both
>  >>  >> > IDs concatenated as a unique identifier, like so:
>  >>  >> >
>  >>  >> > <doc>
>  >>  >> > <field name="ID">125Y5-A</field>
>  >>  >> > <field name="organization">Ford Foundation</field>
>  >>  >> > <field name="Name">John Smith</field>
>  >>  >> > </doc>
>  >>  >> > <doc>
>  >>  >> > <field name="ID">125Y5-B</field>
>  >>  >> > <field name="organization">Ford Foundation</field>
>  >>  >> > <field name="Name">Jane Doe</field>
>  >>  >> > </doc>
>  >>  >> >
>  >>  >> > I can do the transform pretty simply with XSLT.  I suppose it
is
>  >>  >> > possible to get the DataImportHandler to do this, but I'm not
yet
>  >>  >> > convinced that it's easier.
>  >>  >> >
>  >>  >> > Daniel
>  >>  >> >
>  >>  >> >
>  >>  >>
>  >>  >> --
>  >>  >> View this message in context:
>  >>  >>
>  >> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html
>  >>  >> Sent from the Solr - User mailing list archive at Nabble.com.
>  >>  >>
>  >>  >>
>  >>  >
>  >>  >
>  >>  > --
>  >>  > Regards,
>  >>  > Shalin Shekhar Mangar.
>  >>  >
>  >>  >
>  >>
>  >>  --
>  >>  View this message in context:
>  >> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16796900.html
>  >>
>  >>
>  >> Sent from the Solr - User mailing list archive at Nabble.com.
>  >>
>  >>
>  >
>  >
>
>  --
>  View this message in context: http://www.nabble.com/XSLT-transform-before-update--tp16738227p16807488.html
>
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul
Mime
View raw message