xmlgraphics-batik-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron McCormack <cam-batik-...@aka.mcc.id.au>
Subject Re: Performance of Batik.dom.svg.SVGOMDocument.getById()
Date Fri, 27 Aug 2004 01:10:45 GMT
Hi people.

Thomas DeWeese:
>    Thanks for raising this.  This is an issue I had been considering
> addressing for a _long_ time.  It is unfortunately not a really
> trivial thing to 'do right'.  There are three things that jump out
> at me as being potentially problematic with your current
> implementation.

Yes, I've had some code to do id hashing sitting around for a couple of
weeks too, but only just was reminded of it by Fritz' post.

> 1) getElementById I believe works on a subtree of the document.
>    So unless the 'globally' registered element is a child of the
>    element getElementById is called on the result should be null.

Well you can only call getElementById on a Document, so it should find
any element with the right id anywhere in the document tree.


Ah I see, the SVGSVGElement.getElementById (which I hadn't noticed
existed) restricts to the subtree.

> 2) Adding/removing elements/subtree's.  Someone can add a whole
>    subtree to a document and all those nodes need to have their
>    id's added, also when a subtree is removed there id's need to
>    be removed (which might reveal a hidden duplicate id...).

Yes, I found I had to track whether certain elements were in the
document tree or not so that the ids would only be hashed when the
subtree was inserted.

> 3) For things like document fragments or simply disconnected
> node tree's they should probably have an independent id space.

Hmm, document fragments was one that I'd dismissed since there is no
getElementById method on a DocumentFragment.  

>    The first is fairly easily solved by simply ensuring that
> 'this' (in getElementById) is an ancestor of the element found in the
> hash table.  If it isn't then return null.

But yeah, that would work.

>    The second is trickier.  A few ideas, detecting that an old
> element has been removed from the tree should be fairly easy
> given the 'parent' check above (it will fail).  Although it points out
> that the hash entries should be soft references so that elements that
> were once part of the tree with an id can still be GC'ed.  There
> is a significant issue about what should happen if the hashed node
> isn't listed (see next section).

Or you could just make sure that the references are removed from the
hash table when the subtree is removed from the document tree.

>    Adding nodes, the simple thing would be to just walk the new
> subtree and add it's nodes to the parent document.  One issue will
> be the handling of duplicate Id's.  They may be duplicates because
> the old element was recently removed, or they may be duplicates
> because for the next instant both nodes will be in the tree (they
> add then remove for example).  In the second case the "hard ass"
> response is throw an illegal state exception... This will not win
> very many friends however it is probably the 'correct' thing to do.
> If you don't do this, then you run the risk that when you remove a
> subtree there were element's that had the same id that were shadowed
> which would be a pain to track.

I have a hash table of linked lists holding all of the elements with
that id, so that when a duplicate id element is removed, leaving a
single element with the id, getElementById will work again.

>    Another option would be to be slightly lazy about it, so when
> subtree's are added mark them as 'dirty', when someone asks for an id
> - check the hash table if it is present and still part of the document
> great return it.  Otherwise start walking the 'dirty' subtree's adding
> id's as you go, until you get to the one you want (or return null).
> This is nice from the point of view of not making 'add' overly
> expensive.

This is not a bad idea, but I think won't increase performance for cases
like this:

  <g id="g1"/>
  <g id="g2"/>
  <g id="g3"/>
  <g id="g4"/>

and your script looks up each of these IDs in order.


Cameron McCormack
|  Web: http://mcc.id.au/
|  ICQ: 26955922

To unsubscribe, e-mail: batik-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: batik-dev-help@xml.apache.org

View raw message