commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balazs Somogyi" <balazs.somo...@FATHOMTECHNOLOGY.com>
Subject RE: digester + DOM
Date Wed, 26 Feb 2003 15:18:02 GMT
> > Is it possible to feed digester with an already parsed XML 
> (actually 
> > XHTML). I'm using JTidy to parse HTML and would like to 
> extract some 
> > of its elements but don't want to traverse manually the tree.
> 
> You could address the elements you want with XPath. This is 
> likely to be a better approach than serializing the XHTML 
> object tree and having Digester act on that.

Janek,

I'm using Xpath now and it's working properly.

Althought the HTML looks something like this:

<table><tr><td> <!-- bulk entities comes here -->
	garbage
	<b>entity #1 name</b>
	garbage
	<a>entity #2 ref</a>
	garbage
	<b>entity #1 name</b>
	garbage
	<a>entity #2 ref</a>
	garbage
</td></tr></table>

My idea was to use digester to avoid "whiles" in the code to skip
garbage and let digester trigger my code in case of <b> and <a>
elements. Hope, it's more understandable now with the example provided.

I also tried Xpath with string "/table/tr/td/b | /table/tr/td/a" but I
guess I do something wrong because the result node list was really
strange. Did I misunderstood the specs? Or should it work in this way
and I make "only" a coding mistake?

Note that it's not so critical issue I just would like to do it in the
most elegant way :)

Balazs

Mime
View raw message