xmlgraphics-batik-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas DeWeese <Thomas.DeWe...@Kodak.com>
Subject Re: Performance of Batik.dom.svg.SVGOMDocument.getById()
Date Sun, 29 Aug 2004 00:38:37 GMT
Hi Cameron,

    One other thing occured to me on this point.  You need to
look out for is SVGOMCSSImportedElementRoot this is used for
the 'use' element where we clone the referenced content into
the referencing document, obviously the id's from these
cloned subtrees shouldn't be counted.  Just trying to
make it interesting ;)


Thomas DeWeese wrote:

> Cameron McCormack wrote:
> 
>> Ah I see, the SVGSVGElement.getElementById (which I hadn't noticed
>> existed) restricts to the subtree.
>>
>>> 2) Adding/removing elements/subtree's.  Someone can add a whole
>>>   subtree to a document and all those nodes need to have their
>>>   id's added, also when a subtree is removed there id's need to
>>>   be removed (which might reveal a hidden duplicate id...).
>>
>>
>> Yes, I found I had to track whether certain elements were in the
>> document tree or not so that the ids would only be hashed when the
>> subtree was inserted.
>>
>>> 3) For things like document fragments or simply disconnected
>>> node tree's they should probably have an independent id space.
>>
>>
>> Hmm, document fragments was one that I'd dismissed since there is no
>> getElementById method on a DocumentFragment. 
>>
>>>   The first is fairly easily solved by simply ensuring that
>>> 'this' (in getElementById) is an ancestor of the element found in the
>>> hash table.  If it isn't then return null.
>>
>>
>> But yeah, that would work.
>>
>>>   The second is trickier.  A few ideas, detecting that an old
>>> element has been removed from the tree should be fairly easy
>>> given the 'parent' check above (it will fail).  Although it points out
>>> that the hash entries should be soft references so that elements that
>>> were once part of the tree with an id can still be GC'ed.  There
>>> is a significant issue about what should happen if the hashed node
>>> isn't listed (see next section).
>>
>>
>> Or you could just make sure that the references are removed from the
>> hash table when the subtree is removed from the document tree.
> 
> 
>    Well, I would really rather avoid making add/remove heavy weight
> in order to make getElementById light weight.  This is exactly why
> we have SoftReferences and they are really pretty simple to use.
> 
>>>   Adding nodes, the simple thing would be to just walk the new
>>> subtree and add it's nodes to the parent document.  One issue will
>>> be the handling of duplicate Id's.  They may be duplicates because
>>> the old element was recently removed, or they may be duplicates
>>> because for the next instant both nodes will be in the tree (they
>>> add then remove for example).  In the second case the "hard ass"
>>> response is throw an illegal state exception... This will not win
>>> very many friends however it is probably the 'correct' thing to do.
>>> If you don't do this, then you run the risk that when you remove a
>>> subtree there were element's that had the same id that were shadowed
>>> which would be a pain to track.
>>
>>
>> I have a hash table of linked lists holding all of the elements with
>> that id, so that when a duplicate id element is removed, leaving a
>> single element with the id, getElementById will work again.
> 
> 
>    Yah, I considered this option, but it didn't feel right to
> bend over backwards to support invalid content.  A small suggestion
> for a case like this that I have seen done is to normally store
> a reference to the Element, and only when you get a 'conflict'
> do you create the list and store the list (using an instanceof
> check).  This means that in the usual case you avoid the list
> overhead.
> 
>>>   Another option would be to be slightly lazy about it, so when
>>> subtree's are added mark them as 'dirty', when someone asks for an id
>>> - check the hash table if it is present and still part of the document
>>> great return it.  Otherwise start walking the 'dirty' subtree's adding
>>> id's as you go, until you get to the one you want (or return null).
>>> This is nice from the point of view of not making 'add' overly
>>> expensive.
>>
>>
>> This is not a bad idea, but I think won't increase performance for cases
>> like this:
>>
>>   <g id="g1"/>
>>   <g id="g2"/>
>>   <g id="g3"/>
>>   <g id="g4"/>
>>   ...
>>
>> and your script looks up each of these IDs in order.
> 
> 
>   True it won't help in this case but it won't hurt either.  The win
> from my point of view is that it pushes most of the cost of hashing
> Id's into getElementById so you only pay for it if/when you use it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: batik-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: batik-dev-help@xml.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: batik-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: batik-dev-help@xml.apache.org


Mime
View raw message