nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashanka Balakuntala <shbalakunt...@gmail.com>
Subject Re: Reconfiguring scoring plugin
Date Thu, 23 Jul 2020 12:37:55 GMT
Hi Patrick,

Yes, the idea that you have suggested would work, but i do have to mention
that it might just affect the next iteration. So you can just clean the
last parse segment and parse again and updatedb with the plugins activated
and that would do.

Deleting all the the parsed segments might not work because, because a url
with score less than threshold will not be generated or fetched, so none of
its outlinks will be fetched as well. So if you just delete parse segment
and do the process, it would mean the all the already fetched segments will
not be impacted. So it will update the scoring, if you just need the score
for something else, please do go ahead with this.

Lets see if anyone has any other items to add or clear here.

*Regards*
  Shashanka Balakuntala Srinivasa



On Thu, Jul 23, 2020 at 2:40 PM Patrick Mézard <patrick@mezard.eu> wrote:

> Hello,
>
> I have crawled a first document set using a combination of depth and opic
> scoring plugins. I would like to add the similarity scoring plugin but
> obviously the crawldb scores should be updated for it and following
> "generate" phases to be effective. Is there a recommended approach to
> achieve this?
>
> My current understanding is since the similarity plugin operates in parse
> phase, I would have to remove all parsed data from segments, re-parse them
> and updatedb? Would that work? Is there anything smarter?
>
> Thanks,
> --
> Patrick Mézard
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message