xml-xalan-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Bertoni (JIRA)" <xalan-...@xml.apache.org>
Subject [jira] Commented: (XALANC-593) Poor performance with a complex XSL stylesheet and large XML file
Date Thu, 22 Dec 2005 20:08:31 GMT
    [ http://issues.apache.org/jira/browse/XALANC-593?page=comments#action_12361134 ] 

David Bertoni commented on XALANC-593:
--------------------------------------

This stylesheet has constructs that cannot scale.  For example:

<xsl:for-each select="//UPN[ ( .  =   preceding::UPN ) and not( . = following::UPN )  ]">
    <xsl:for-each select="//PupilIdentifiers[ UPN = current() ] ">
        <xsl:call-template name="Error">
            <xsl:with-param name="err_num" select="60"/>
            <xsl:with-param name="data" select="concat('UPN:', UPN, '|Surname:', Surname,
'|Forename:', Forename, '|Gender:', Gender, '|DOB:', DOB )"/>
        </xsl:call-template>
    </xsl:for-each>	
</xsl:for-each>

The first problem is using "//" in XPath expressions.  "//" forces the processor to search
the _entire_ document for UPN elements.  That means the processor has to look at every element
in the source tree, which is never going to scale.

It looks to me like the element UPN only appears in the following paths:

/Message/Pupils/PupilsOnRoll/PupilOnRoll/PupilIdentifiers/UPN
/Message/Pupils/PupilsNoLongerOnRoll/PupilNoLongerOnRoll/PupilIdentifiers/UPN

So you could re-write this as a union of those two paths:

(/Message/Pupils/PupilsOnRoll/PupilOnRoll/PupilIdentifiers/UPN | /Message/Pupils/PupilsNoLongerOnRoll/PupilNoLongerOnRoll/PupilIdentifiers/UPN)

However, the predicate for this XPath expression is an even bigger problem:

[ ( .  =   preceding::UPN ) and not( . = following::UPN )  ]

The preceding and following axes won't scale, because their complexity is not linear.  You
are again forcing the processor to look at all the elements preceding and following the current
UPN to look for other UPN elements.  I think you should look at xsl:key and modify your stylesheet
to use keys for these identity constraint cases.  Without spending too much trying to understand
the semantics of your stylesheet, I suspect you are using brute force lookup to find duplicate
UPN elements.  This is trivial to do with keys, and is much faster because the processor builds
lookup tables for each key.

As a final comment, I would also like to point out that much of the work you're doing with
your stylesheet is validating the content of the document, which would be much better done
with an XML schema that validates the instance document while it's being parsed.  You can,
of course, write a stylesheet to do this, but I think what you're seeing is the performane
is not really optimal.

> Poor performance with a complex XSL stylesheet and large XML file
> -----------------------------------------------------------------
>
>          Key: XALANC-593
>          URL: http://issues.apache.org/jira/browse/XALANC-593
>      Project: XalanC
>         Type: Bug
>   Components: XalanC
>     Versions: 1.9, 1.10
>  Environment: Platform: Windows XP Professional
> Processor: 2GHz
> RAM: 1Gb
>     Reporter: srguard2000-general@yahoo.co.uk
>  Attachments: SchoolCensus06-ErrorList-v1.4.xsl, SchoolCensus06-ValidationRules-v1.4.xsl,
TEMP_XI_Y1219154918.xml, TEMP_XI_Y1219154918_halved.xml
>
> Xalan is performing poorly for a complex  XSL transform on a large XML file.
> I have the details below, and I am attaching files for XML input and the XSL files.
> There are 2 problems - one is that a 1.5MB XML file takes about 2 minutes to be transformed.
> This could be solved by changing the XSL? - any suggestions welcome!
> The second problem is that the performance does not 'scale'  with the size of the XML
input - I took the same XML file and halved the size, and the performance more than doubled.
 
> So it looks like performance worsens with the size of the XML input.
> Performance in 1_10 is slightly worse than 1_9.
> ===============================
> Xalan-C_1_9_0-win32-msvc_60
> Xalan -t:
> 1.5MB XML:
> Source tree parsing time: 340.398336 milliseconds.
> Stylesheet compilation time: 133.1826288 milliseconds.
> Transformation time: 119932.1820512 milliseconds.
> 733Kb XML:
> Source tree parsing time: 158.737142 milliseconds.
> Stylesheet compilation time: 67.1794638 milliseconds.
> Transformation time: 36380.30150 milliseconds.
> ===============================
> 1.5MB XML:
> Xalan-C_1_10_0-win32-msvc_60
> Xalan -t:
> Source tree parsing time: 255.852040 milliseconds.
> Stylesheet compilation time: 68.236948 milliseconds.
> Transformation time: 134556.299906 milliseconds.
> 733Kb XML:
> Source tree parsing time: 142.380952 milliseconds.
> Stylesheet compilation time: 68.1692120 milliseconds.
> Transformation time: 41232.867330 milliseconds.
> ===============================

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org


Mime
View raw message