nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Radim Kolar (Commented) (JIRA)" <>
Subject [jira] [Commented] (NUTCH-1098) better url-normalizer basic
Date Thu, 03 Nov 2011 00:01:33 GMT


Radim Kolar commented on NUTCH-1098:

a/ Please direct your complains about quality of git generated patches to git mailing list.
i am not going to generate patches for you manually by running diff -Naur

b/ if you used something better then SVN (hg,git,bzr) you can cherrypick changes from my branch,
create new branches for every subtask and attaching branches to JIRA reports and then you
can discuss them separately.

c/ more efficient is if i dont spend more 10x more time in pointless discussions then on coding
> better url-normalizer basic
> ---------------------------
>                 Key: NUTCH-1098
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.3
>         Environment: Any
>            Reporter: Radim Kolar
>            Assignee: Markus Jelsma
>              Labels: encoding, url
>             Fix For: 1.5
>         Attachments: patch-urlnormalizer.diff
>   Original Estimate: 4h
>  Remaining Estimate: 4h
> Basic URL normalizer lacks 2 important features
> Encode space in URL into %20 to unbreak httpclient and possibly others who do not expect
space inside URL
> Ability to decode %33 encoding in URL. This is important for avoiding duplicates

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message