nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From BELLINI ADAM <mbel...@msn.com>
Subject RE: Content of redirected urls empty
Date Mon, 08 Mar 2010 17:08:06 GMT

i'm sorry...i just checked twice...and in my index i have the original URL, which is  the
HTTP one with the empty content...but it dosent index the HTTPS one....and i using solr index
thx



> From: mbellil@msn.com
> To: nutch-user@lucene.apache.org
> Subject: RE: Content of redirected urls empty
> Date: Mon, 8 Mar 2010 17:01:34 +0000
> 
> 
> 
> 
> Hi, i'v just dumped my segments and found that i have both 2 URLS, the original one (HTTP)
with an empty content and the REDIRCTED TO or the DESTINATION URL (HTTPS) with NON EMPTY content
!
> 
> but in my search i found only the HTTPS URL with an empty content !! logically the content
of the HTTPS  URL is not empty !
> it's just mixing the HTTPS url with the content of the HTTP one.
> 
> 
> our redirect is done by java code  response.sendRedirect(…), so it seams to be http
redirect right ??
> 
> thx for helping me :)
> 
> 
> > Date: Mon, 8 Mar 2010 15:51:34 +0100
> > From: ab@getopt.org
> > To: nutch-user@lucene.apache.org
> > Subject: Re: Content of redirected urls empty
> > 
> > On 2010-03-08 14:55, BELLINI ADAM wrote:
> > >
> > >
> > > is there any idea guys ??
> > >
> > >
> > >> From: mbellil@msn.com
> > >> To: nutch-user@lucene.apache.org
> > >> Subject: Content of redirected urls empty
> > >> Date: Fri, 5 Mar 2010 22:01:05 +0000
> > >>
> > >>
> > >>
> > >> hi,
> > >> the content of my redirected urls is empty...but still have the other metadata...
> > >> i have an http urls that is redirected to https.
> > >> in my index i find the http URL but with an empty content...
> > >> could you explain it plz?
> > 
> > There are two ways to redirect - one is with protocol, and the other is 
> > with content (either meta refresh, or javascript).
> > 
> > When you dump the segment, is there really no content for the redirected 
> > url?
> > 
> > 
> > -- 
> > Best regards,
> > Andrzej Bialecki     <><
> >   ___. ___ ___ ___ _ _   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> > 
>  		 	   		  
> _________________________________________________________________
> Live connected with Messenger on your phone
> http://go.microsoft.com/?linkid=9712958
 		 	   		  
_________________________________________________________________
IM on the go with Messenger on your phone
http://go.microsoft.com/?linkid=9712960
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message