nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sidbatra <siddharthaba...@gmail.com>
Subject RE: ParseSegment taking a long time to finish
Date Mon, 02 Jul 2012 20:43:19 GMT
I'll run more experiments on that segment. My regex-urlfilter.txt removes
urls longer than 350 chars.

-^.{350,}$

Any recommendations for max URL char length? or any other hypothesis that I
can test to confirm the problem?

--
View this message in context: http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992601.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Mime
View raw message