manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gustavo Beneitez <gustavo.benei...@gmail.com>
Subject Create a new ACTIVITY_FETCH from a transformation
Date Wed, 25 Jul 2018 21:57:26 GMT
Hi all,

I need to extract and analyse crawled urls because they may contain certain
parameters such as "?redirectURL=" that could point to new Documents to be
fetched and indexed.

First I was trying to create a subclass that extends

public class RedirectExtractor extends
org.apache.manifoldcf.agents.transformation.BaseTransformationConnector

and add a "RedirectExtractor" transformation step to the fetch process in
ManifoldCF, but it only allows me to modify current Document, not to create
a new FETCH from the extracted parameter.

I was investigating manifoldCF source code and I found something that may
be in hand

activities.recordActivity(null,ACTIVITY_FETCH,
                null,urlValue,Integer.toString(-2),"Robots exclusion",null);

from the IProcessActivity interface, which is used by the Connectors. I
didn't want to create a new connector since it is a bit complex but, do you
see an alternative or this is the only way?

Thanks in advance.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message