tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Barker" <>
Subject Re: retrieving remote web content
Date Sun, 10 Nov 2002 08:51:44 GMT
There is also the Jakarta HttpClient:

"Reynir HŁbner" <> wrote in message

I haven't made a servlet to do this, but I made a jsp-tag that can do this.

If you don't want to move the images from one server to another (from google
to yours) as a proxy would do it, then you must parse the HTML, and change
all the urls for css, img, hrefs, javascripts and a lot more so that they
are "fully qualified" urls such as but not
only /images/logo.gif or such.

This is usually not very complicated, but it can be a little tricky,
especially with javascripts and such.
I used regular expression to do this, more specifically the jakarta-oro
package.. I still recommend some serverside cacheing of parsed pages, as
this can be quite process demanding procedure.

If you find some library to do this, please tell us about it.

There are some libraries that might help doing the http-requests, so check
that one out, its HTTPClient:

Hope it helps,

> -----Original Message-----
> From: Jason Novotny []
> Sent: 9. nůvember 2002 22:44
> To: Tomcat Users List; Jetspeed Developers List
> Subject: retrieving remote web content
>     Hi,
>     I'm trying to develop a servlet that can act as a proxy
> for another
> web site-- lets' say I'm trying to provide the content of
> It seems I can retrieve and cache the HTML using a
> URLConnection, but what about the resources used by the HTML
> like gif's
> and jpg's. Somehow I need to parse the HTML and get those
> separately? Is
> there a library out there for doing what I describe? Maybe
> I'm missing
> something relaly simple...
>     Thanks, Jason
> --
> To unsubscribe, e-mail:
> <mailto:tomcat-user->>
> For
> additional commands,
> e-mail: <>

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message