nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Redirects and alias handling (LONG)
Date Wed, 15 Aug 2007 20:01:29 GMT
>Ken Krugler wrote:
>
>>>>common case. Thus it could be somewhat computationally expensive 
>>>>(e.g. a winnowing ala 
>>>>http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf).
>>>
>>>Interesting paper, thanks for the pointer - I always wondered what 
>>>criteria to use to reduce the number of shingles, and this 
>>>winnowing is a simple enough recipe for creating page signatures. 
>>>I may be tempted to implement it ;)
>>
>>I took a quick scan through the public code and didn't find 
>>anything that looked appropriate for this. One more potentially 
>>useful paper is here:
>>
>>http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
>
>This URL looks similar to the one you mentioned before ... probably 
>a case of near-duplicate *chuckle* ...

Sorry about that - I can't really claim I was checking your manual 
dedup support. The real URL is:

http://www1.cs.columbia.edu/~cs6998/final_reports/ca2269-report.pdf

-- Ken
-- 
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"

Mime
View raw message