xerces-j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Brosius" <dbros...@mebigfatguy.com>
Subject Re: Interning strategy
Date Thu, 10 Jan 2008 04:11:56 GMT
So then if i am to understand you correctly, the SymbolTable is used 
primarily to

1) avoid the synchronization cost of intern.

and secondarily to

2) allow for possible alternative hash algorithms.

The SoftReferenceSymbolTable in addition

allows for the releasing of temporary symbol table buttressing, but does not 
do anything for intern bloat.

It's a shame that xerces doesn't just allow for the installation of a custom 
interning manager, instead

such as

public SAXParser(Interner interner);

With a default implementation (among others) just being a simple identity 
hash map, with the user code needing then to reference that interner

public MyContentManager extends DefaultHandler {
    String lookForNode = interner.intern("TheNodeNameIWant");
    public void startElement(String uri, String localName, String qName, 
Attributes atts) {
        if (localName == lookForNode)
            dosomethinguseful();
    }
}

But ok, thanks, I think i understand now.
dave

----- Original Message ----- 
From: "Michael Glavassevich" <mrglavas@ca.ibm.com>
To: <j-dev@xerces.apache.org>
Sent: Wednesday, January 09, 2008 10:52 PM
Subject: Re: Interning strategy


> Hi Dave,
>
> The strings didn't need to be interned for Xerces' internals to work
> correctly (though the code has since evolved to depend on that now). It's
> just cheaper to do the intern once and cache it in the SymbolTable than to
> do it later, possibly multiple times at the API layer. Some history here
> [1] if you're interested.
>
> Thanks.
>
> [1] http://issues.apache.org/jira/browse/XERCESJ-6
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> "Dave Brosius" <dbrosius@apache.org> wrote on 01/09/2008 10:27:20 PM:
>
>> Clearly based on your response, and the fact that the Soft referenced
> table
>> also interns, i completely misunderstood (and still do) what the
> SymbolTable
>> class is used for.
>>
>> I guess i'll have to take another attempt at understanding what it is
> being
>> used for.
>>
>>
>> ----- Original Message -----
>> From: "Michael Glavassevich" <mrglavas@ca.ibm.com>
>> To: <j-dev@xerces.apache.org>
>> Sent: Wednesday, January 09, 2008 4:16 PM
>> Subject: Re: Interning strategy
>>
>>
>> > Hi Dave,
>> >
>> > It's being interned for the application. Allows your SAX content
> handler
>> > to
>> > compare the names of elements, attributes, etc... using reference
>> > comparison [1] instead of equals for better performance. There's an
>> > alternate implementation of the SymbolTable [2] which is more sensitive
> to
>> > memory usage. It allows interned strings to be garbage collected if
>> > they're
>> > only reachable through the SymbolTable.
>> >
>> > Thanks.
>> >
>> > [1] http://xerces.apache.org/xerces2-j/features.html#string-interning
>> > [2]
>> > http://xerces.apache.org/xerces2-
>> j/javadocs/xerces2/org/apache/xerces/util/SoftReferenceSymbolTable.html
>> >
>> > Michael Glavassevich
>> > XML Parser Development
>> > IBM Toronto Lab
>> > E-mail: mrglavas@ca.ibm.com
>> > E-mail: mrglavas@apache.org
>> >
>> > "Dave Brosius" <dbrosius@mebigfatguy.com> wrote on 01/09/2008 01:01:06
> AM:
>> >
>> >> Greetings, i was purusing old mailing list emails, and stumbled onto
> the
>> >> following email sent some time ago :)
>> >>
>> >> Luckily, from a quick perusal of the code, it appears that the email
>> > still
>> >> applies.
>> >>
>> >> I have a question about the implementation of SymbolTable
>> >>
>> >> As expected, it appears to me to that it does hashing to find a
> bucket,
>> > then
>> >> walks the chain of pointers from the bucket to find a string that is
>> >> 'equals'
>> >>
>> >> Only if it doesn't exist is a new one added. All of this makes sense.
>> >>
>> >> The question i have then, is why when you add an entry
>> >>
>> >> public Entry(String symbol, Entry next) {
>> >>     this.symbol = symbol.intern();
>> >>     characters = new char[symbol.length()];
>> >>     symbol.getChars(0, characters.length, characters, 0);
>> >>     this.next = next;
>> >> }
>> >>
>> >> does the code intern the string? Isn't the point of this class to stop
>> >> pollution of the constant pool and perm gen? (besides allowing for
>> > alternate
>> >> hashing?)
>> >> Given that the one String that lives in the SymbolTable is returned, i
>> > would
>> >> think intern is redundant.
>> >>
>> >> thanks,
>> >> dave
>> >>
>> >> ----- Original Message -----
>> >> From: "Michael Glavassevich" <mrglavas@ca.ibm.com>
>> >> To: <j-dev@xerces.apache.org>
>> >> Sent: Sunday, July 24, 2005 11:57 AM
>> >> Subject: Re: Interning strategy
>> >>
>> >>
>> >> Elliotte Harold <elharo@metalab.unc.edu> wrote on 07/22/2005 09:35:02
> PM:
>> >>
>> >> > Suppose I turn on interning in the parser by setting the SAX
> property
>> >> > http://xml.org/sax/features/string-interning to true. Will Xerces
>> > simply
>> >>
>> >> > invoke the String.intern() method on the strings it creates or does
> it
>> >> > do something fancier like maintaining its own pool of string
> constants
>> >> > and reuse those?
>> >>
>> >> It maintains a pool. See org.apache.xerces.util.SymbolTable,
> specifically
>> >> the addSymbol() methods.
>> >>
>> >> > --
>> >> > Elliotte Rusty Harold  elharo@metalab.unc.edu
>> >> > XML in a Nutshell 3rd Edition Just Published!
>> >> > http://www.cafeconleche.org/books/xian3/
>> >> >
> http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
>> >> >
>> >> >
> ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
>> >> > For additional commands, e-mail: j-dev-help@xerces.apache.org
>> >> >
>> >>
>> >> Michael Glavassevich
>> >> XML Parser Development
>> >> IBM Toronto Lab
>> >> E-mail: mrglavas@ca.ibm.com
>> >> E-mail: mrglavas@apache.org
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
>> >> For additional commands, e-mail: j-dev-help@xerces.apache.org
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
>> >> For additional commands, e-mail: j-dev-help@xerces.apache.org
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
>> > For additional commands, e-mail: j-dev-help@xerces.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-dev-help@xerces.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


Mime
View raw message