nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi" <f...@efendi.ca>
Subject RE: Nutch truncating URL to 318 Chars
Date Tue, 01 Sep 2009 22:16:35 GMT
What it truncates, 'http://' or 'sId=386'? Or something inside URL?


Just inject http://business.verizon.net/ ... nutch should find the rest...

I believe Nutch doesn't have any limits with URL length, although some Web
servers limited to 4000...


>
http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_pageLabel
=S
>
MBPortal_page_main_marketplace&_nfpb=true&_windowLabel=MarketPlacePFControll
er
>
_1&MarketPlacePFController_1_actionOverride=%252Fpageflows%252Fverizon%252Fs
mb
>
%252Fportal%252FmarketPlacePF%252FgetProductDetails&MarketPlacePFController_
1p
> roductsId=386
> 
> Thanks/Regards,
> Parvez
> 
> 
> 
> On Tue, Sep 1, 2009 at 4:43 PM, Fuad Efendi <fuad@efendi.ca> wrote:
> 
> > > I opened the part-00000 file in the dump folder and there, is only ONE
> > url
> > > and it has been truncated to 318 chars
> > > How make Nutch consider URLs with length more than 318 chars
> >
> > Please provide original (before truncating) sample of such URL
> > Thanks
> >
> >
> >
> >
> >



Mime
View raw message