Return-Path: Delivered-To: apmail-lucene-nutch-user-archive@www.apache.org Received: (qmail 34798 invoked from network); 1 Sep 2009 21:56:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Sep 2009 21:56:17 -0000 Received: (qmail 22370 invoked by uid 500); 1 Sep 2009 21:56:16 -0000 Delivered-To: apmail-lucene-nutch-user-archive@lucene.apache.org Received: (qmail 22303 invoked by uid 500); 1 Sep 2009 21:56:15 -0000 Mailing-List: contact nutch-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-user@lucene.apache.org Delivered-To: mailing list nutch-user@lucene.apache.org Received: (qmail 22293 invoked by uid 99); 1 Sep 2009 21:56:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Sep 2009 21:56:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of parvez@gmail.com designates 209.85.132.248 as permitted sender) Received: from [209.85.132.248] (HELO an-out-0708.google.com) (209.85.132.248) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Sep 2009 21:56:07 +0000 Received: by an-out-0708.google.com with SMTP id b2so134284ana.5 for ; Tue, 01 Sep 2009 14:55:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=5FlbOSMhon8VELKuFpckF61RV0n1WRsojTJRmA4ztTQ=; b=JBHR/Cl2dUkV+0H51DfKmMXgD+T3lkbinhgOPyA8UAIJjZfY4TgeqfQE94pwbIfvqT 4d/8QsQrPUZf6rJ5i9FtJtCvgheZNkxBDHvINtogRhL7YcakqP9xhDNE3pDHQNzxWfOP 2WcLIZIVV8Nq9jWn3aBi4kAFAsgCkDiUFsYIg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=KqhjplA0FRnZupylLX8aD38y2UyBhDMG+DZ7xnwbqN7vzvWkvUAe9e4QHf1nVEHCP7 VHRoZ3BhJLxBnrHG3z1JkmOK8OXMH/u7q8W+M7s5p5O1s6gxluHCKOdfVmPGPlvh4fXX ZlJxW09VCkx6y5HZijKn86Pfd5Dld747Hlo3U= MIME-Version: 1.0 Received: by 10.101.33.8 with SMTP id l8mr8177205anj.167.1251842147069; Tue, 01 Sep 2009 14:55:47 -0700 (PDT) In-Reply-To: <01b001ca2b4d$43f92cc0$cbeb8640$@ca> References: <01b001ca2b4d$43f92cc0$cbeb8640$@ca> From: Mohamed Parvez Date: Tue, 1 Sep 2009 16:55:27 -0500 Message-ID: Subject: Re: Nutch truncating URL to 318 Chars To: nutch-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016367b62807e043504728b36bf X-Virus-Checked: Checked by ClamAV on apache.org --0016367b62807e043504728b36bf Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_pageLabe= l=3DSMBPortal_page_main_marketplace&_nfpb=3Dtrue&_windowLabel=3DMarketPlace= PFController_1&MarketPlacePFController_1_actionOverride=3D%252Fpageflows%25= 2Fverizon%252Fsmb%252Fportal%252FmarketPlacePF%252FgetProductDetails&Market= PlacePFController_1productsId=3D386 Thanks/Regards, Parvez On Tue, Sep 1, 2009 at 4:43 PM, Fuad Efendi wrote: > > I opened the part-00000 file in the dump folder and there, is only ONE > url > > and it has been truncated to 318 chars > > How make Nutch consider URLs with length more than 318 chars > > Please provide original (before truncating) sample of such URL > Thanks > > > > > --0016367b62807e043504728b36bf--