nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From BlackIce <>
Subject removing "\n"... Nutch 1.14
Date Mon, 26 Feb 2018 15:17:43 GMT

did run into a problem with Nutch 1.14 which I don't recall having in
previous versions

I'm find a lot of "\n"  (Newline?) in my content of crawled sites.

I've tried with different configurations/constelations of Html parser and
Tika and just Tika to no avail.

All the info I can find on this this is regarding older versions of Nutch..
like ancient versions...

Did something change on to were there is an extra configuration step now



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message