xmlgraphics-batik-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas DeWeese <Thomas.DeWe...@Kodak.com>
Subject Re: Performance of Batik.dom.svg.SVGOMDocument.getById()
Date Thu, 26 Aug 2004 17:27:40 GMT
Hi Fritz,

    Thanks for raising this.  This is an issue I had been considering
addressing for a _long_ time.  It is unfortunately not a really
trivial thing to 'do right'.  There are three things that jump out
at me as being potentially problematic with your current
implementation.

1) getElementById I believe works on a subtree of the document.
    So unless the 'globally' registered element is a child of the
    element getElementById is called on the result should be null.

2) Adding/removing elements/subtree's.  Someone can add a whole
    subtree to a document and all those nodes need to have their
    id's added, also when a subtree is removed there id's need to
    be removed (which might reveal a hidden duplicate id...).

3) For things like document fragments or simply disconnected
node tree's they should probably have an independent id space.

    The first is fairly easily solved by simply ensuring that
'this' (in getElementById) is an ancestor of the element found in the
hash table.  If it isn't then return null.

    The second is trickier.  A few ideas, detecting that an old
element has been removed from the tree should be fairly easy
given the 'parent' check above (it will fail).  Although it points out
that the hash entries should be soft references so that elements that
were once part of the tree with an id can still be GC'ed.  There
is a significant issue about what should happen if the hashed node
isn't listed (see next section).

    Adding nodes, the simple thing would be to just walk the new
subtree and add it's nodes to the parent document.  One issue will
be the handling of duplicate Id's.  They may be duplicates because
the old element was recently removed, or they may be duplicates
because for the next instant both nodes will be in the tree (they
add then remove for example).  In the second case the "hard ass"
response is throw an illegal state exception... This will not win
very many friends however it is probably the 'correct' thing to do.
If you don't do this, then you run the risk that when you remove a
subtree there were element's that had the same id that were shadowed
which would be a pain to track.

    Another option would be to be slightly lazy about it, so when
subtree's are added mark them as 'dirty', when someone asks for an id
- check the hash table if it is present and still part of the document
great return it.  Otherwise start walking the 'dirty' subtree's adding
id's as you go, until you get to the one you want (or return null).
This is nice from the point of view of not making 'add' overly
expensive.

    Anyway I'm kinda rambling but I hope this helps somewhat!


Sieker, Fritz wrote:

> Hi:
> 
> I have a number of SVG files produced by Visio that take a long time (>
> 20 minutes) to load. I used Jprobe and found that the biggest time sink
> was in Batik.dom.svg.SVGOMDocument.getById(). As an experiment, I
> replaced the scan of
> "protected static Element getById(String id, Node node)" by a hash
> table.
> 
> Specifically, I added the following in SVGOMDocument.java
> 
>     private HashMap elementsWithIDs = new HashMap();
> 
>     public void registerElement (String id, Element ele) {
>       elementsWithIDs.put(id, ele);
>     }
> 
> And modified the getElementById() call to:
> 
>    public Element getElementById(String elementId) {
> 	return (Element) elementsWithIDs.get(elementId);
>    }
> 
> Then to actually, load the hash table, I modified
> org.apache.batik.dom.AbstractElement.java to register elements when the
> "id" was set. For example, in "public void setAttributeNS()", at the end
> I added code
> 
>       if (qualifiedName.equals("id")) {
> 	  ((org.apache.batik.dom.svg.SVGOMDocument)getOwnerDocument())
> 	    .registerElement(value, this);
> 	}
> 
> I similarly modified "public void setAttribute()"
> 
> The results were dramatic, with the > 20 minute load dropping to just a
> few seconds.
> 
> What other changes would be necessary to make this change "correct"?
> 
> Thanks
> 
> Fritz Sieker
> 
> fritz.sieker@hp.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: batik-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: batik-dev-help@xml.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: batik-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: batik-dev-help@xml.apache.org


Mime
View raw message