nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From BlackIce <blackice...@gmail.com>
Subject removing "\n"... Nutch 1.14
Date Mon, 26 Feb 2018 15:17:43 GMT
Hi,

did run into a problem with Nutch 1.14 which I don't recall having in
previous versions

I'm find a lot of "\n"  (Newline?) in my content of crawled sites.

I've tried with different configurations/constelations of Html parser and
Tika and just Tika to no avail.

All the info I can find on this this is regarding older versions of Nutch..
like ancient versions...

Did something change on to were there is an extra configuration step now
required?

Greetz

RRK

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message