nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohamed Parvez <>
Subject URL with Space
Date Thu, 03 Sep 2009 18:26:47 GMT
I am trying to crawl a URL that has space in it.

NUTCH-661 suggests that his can be fixed with a urlnormalizer plugin.

I am suing the urlnormalizer plugin (urlnormalizer-(pass|regex|basic)) and I
put the below rule in the conf/regex-normalize.xml file


But still the URL with space is not getting crawled.

Any hint, as to, what needs to be added in the the conf/regex-normalize.xml
file, to make Nutch crawl URLs with spaces.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message