manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Tavard <olivier.tav...@francelabs.com>
Subject web connector : links extraction issues
Date Mon, 29 Oct 2018 17:39:27 GMT
Hi,

Regarding the web connector, I noticed that for specific websites, some Javascript code can
prevent the web connector to fetch correctly all the links present on the page. Specifically,
for websites that contain a deprecated version of New relic web agent as js-agent.newrelic.com/nr-1071.min.js
<http://js-agent.newrelic.com/nr-1071.min.js>.
After downloading the page locally and removing the reference to the new relic agent browser,
the links were correctly fetched in the page by the web connector. So it seems that the Javascript
injection here caused by the new relic agent was the cause of the links not fetched in the
page.
This case is rare and concerns only old versions of New Relic agent. But in a more generic
way, would it be possible to block the javascript injection at the connector level during
the indexation ?
 
Thanks,
Best regards,
Olivier 



Mime
View raw message