httpd-modules-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MK>
Subject Re: how to parse html content in handler
Date Fri, 25 Mar 2011 13:28:01 GMT
On Thu, 24 Mar 2011 20:10:46 +0800 (CST)
Whut  Jia <> wrote:
> Hi,all
> I want to parse a html content and withdraw some element in myself
> apache handler.Please ask how to do it. Thanks,
> Jia

I think right now the only public C library for parsing html is in the
venerable and long unmaintained libwww.  

However, I wrote a quick and simple, event driven parser library a few
months ago -- I have been meaning to open source this on CCAN or
somewhere but have not gotten around to it, so if you are interested
you can send me a message directly, I have some basic scraper demos
etc.   It is not on the scale of libwww -- it is just a low level HTML
parser -- but I am sure it could do what you want, and you can either
compile it in or link to with an apache module (it has no further

"Enthusiasm is not the enemy of the intellect." (said of Irving Howe)
"The angel of history[...]is turned toward the past." (Walter Benjamin)

View raw message