nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <>
Subject Re: Redirects and alias handling (LONG)
Date Wed, 15 Aug 2007 20:01:29 GMT
>Ken Krugler wrote:
>>>>common case. Thus it could be somewhat computationally expensive 
>>>>(e.g. a winnowing ala 
>>>Interesting paper, thanks for the pointer - I always wondered what 
>>>criteria to use to reduce the number of shingles, and this 
>>>winnowing is a simple enough recipe for creating page signatures. 
>>>I may be tempted to implement it ;)
>>I took a quick scan through the public code and didn't find 
>>anything that looked appropriate for this. One more potentially 
>>useful paper is here:
>This URL looks similar to the one you mentioned before ... probably 
>a case of near-duplicate *chuckle* ...

Sorry about that - I can't really claim I was checking your manual 
dedup support. The real URL is:

-- Ken
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"

View raw message