tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynir Hübner <>
Subject RE: retrieving remote web content
Date Sat, 09 Nov 2002 22:57:04 GMT

I haven't made a servlet to do this, but I made a jsp-tag that can do this. 

If you don't want to move the images from one server to another (from google to yours) as
a proxy would do it, then you must parse the HTML, and change all the urls for css, img, hrefs,
javascripts and a lot more so that they are "fully qualified" urls such as
but not only /images/logo.gif or such. 

This is usually not very complicated, but it can be a little tricky, especially with javascripts
and such. 
I used regular expression to do this, more specifically the jakarta-oro package.. I still
recommend some serverside cacheing of parsed pages, as this can be quite process demanding

If you find some library to do this, please tell us about it.

There are some libraries that might help doing the http-requests, so check that one out, its

Hope it helps, 

> -----Original Message-----
> From: Jason Novotny [] 
> Sent: 9. nóvember 2002 22:44
> To: Tomcat Users List; Jetspeed Developers List
> Subject: retrieving remote web content
>     Hi,
>     I'm trying to develop a servlet that can act as a proxy 
> for another 
> web site-- lets' say I'm trying to provide the content of 
> It seems I can retrieve and cache the HTML using a 
> URLConnection, but what about the resources used by the HTML 
> like gif's 
> and jpg's. Somehow I need to parse the HTML and get those 
> separately? Is 
> there a library out there for doing what I describe? Maybe 
> I'm missing 
> something relaly simple...
>     Thanks, Jason
> --
> To unsubscribe, e-mail:   
> <mailto:tomcat-user->>
> For 
> additional commands, 
> e-mail: <>

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message