httpd-modules-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MK ...@cognitivedissonance.ca>
Subject Re: how to parse html content in handler
Date Fri, 25 Mar 2011 13:28:01 GMT
On Thu, 24 Mar 2011 20:10:46 +0800 (CST)
Whut  Jia <whut_jia@163.com> wrote:
> Hi,all
> I want to parse a html content and withdraw some element in myself
> apache handler.Please ask how to do it. Thanks,
> Jia

I think right now the only public C library for parsing html is in the
venerable and long unmaintained libwww.  

However, I wrote a quick and simple, event driven parser library a few
months ago -- I have been meaning to open source this on CCAN or
somewhere but have not gotten around to it, so if you are interested
you can send me a message directly, I have some basic scraper demos
etc.   It is not on the scale of libwww -- it is just a low level HTML
parser -- but I am sure it could do what you want, and you can either
compile it in or link to with an apache module (it has no further
dependencies).


-- 
"Enthusiasm is not the enemy of the intellect." (said of Irving Howe)
"The angel of history[...]is turned toward the past." (Walter Benjamin)


Mime
View raw message