commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <skitch...@apache.org>
Subject Re: [Digester] Keeping inner elements as text
Date Tue, 12 Jul 2005 23:05:43 GMT
On Tue, 2005-07-12 at 17:35 -0400, Farzad Kohantorabi wrote:
> Hi all,
> 
> Some where in the xml I'm parsing there is an <abstract> element which
> I want to store in a property, I mean with all of its content. For
> example, the element might look like:
> 
> <abstract>
> Kringles are autonomous structural domains, found throughout the blood
> clotting and fibrinolytic proteins.
> Kringle domains are believed to play a role in binding mediators
> (e.g., membranes,
> other proteins or phospholipids), and in the regulation of proteolytic activity
> <cite idref="PUB00002414"/>, <cite idref="PUB00001541"/>, <cite
> idref="PUB00003257"/>.
> Kringle domains <cite idref="PUB00003400"/>, <cite
> idref="PUB00000803"/>, <cite idref="PUB00001620"/> are characterised
> by a triple loop, 3-disulphide bridge structure, whose  conformation
> is defined by a number of hydrogen bonds and small pieces of 
> anti-parallel beta-sheet. They are found in a varying number  of 
> copies  in some plasma proteins including prothrombin and
> urokinase-type plasminogen activator, which are serine proteases
> belonging to MEROPS peptidase family S1A.
> </abstract>
> 
> and I am using the following code to set <abstract> content in the
> corresponding property:
> 
>         digester.addCallMethod("*/interpro/abstract", "setAbstractDesc", 0);
> 
> This code simply ignores <cite ...../> elements for I have not set any
> rule to translate them. However, what I want to do is to set the whole
> text, even inner tags under <abstract/>, in abstractDesc property.
> Please guide me how to do it with digester.

This won't be easy to do. The input is being processed by an xml parser,
so Digester receives a stream of SAX events. There is no way at all to
tell an xml parser to stop generating events and provide the raw text.

If your input were:
<abstract>
<![CDATA[
any text at all
]]>
</abstract>

then that would work. An xml parser treats anything within a CDATA block
as a simple text string.

Otherwise you will probably need to write a custom rule that accepts the
data provided by the xml parser and serialises the various xml elements
back into text. That's not impossible but not trivial either.


In the future, please send questions such as this to the user list
rather than the development list.


Regards,

Simon



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message