manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1392) Add option for Web connector to ignore robots instructions in meta tags and rel attributes
Date Mon, 27 Feb 2017 21:35:45 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886565#comment-15886565
] 

Karl Wright commented on CONNECTORS-1392:
-----------------------------------------

Hi [~schuch], I think it is likely that people who are breaking the rules will break some
of them but not *all* of them.  The reason that the meta and rel rules are currently hardwired
is because UIs that have "execution" buttons of any kind really shouldn't be clicking those
buttons.

There's also the problem that you will *absolutely* need to maintain backwards compatibility.
 If you fold this change of functionality together with the robots processing, there is no
way to do that.  So I encourage you to make separate controls/switches for *each* rule you
want to be able to break.


> Add option for Web connector to ignore robots instructions in meta tags and rel attributes
> ------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1392
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1392
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Web connector
>            Reporter: Markus Schuch
>
> The Web connectors already allows to ignore robots.txt by option.
> With this ticket, another option is added, to allow the connector to ignore robots instructions
in {{<meta name="robots ...}} tags and {{<a ... rel="nofollow" ...}} attributes.
> *First proposal (to be discussed)*
> Reuse the existing "Robots.txt usage" option in the "Robots" Tab. Rename the existing
options:
> # Don't look at robots.txt, meta robots and rel attributes
> # Obey robots.txt, meta robots tags and rel attributes for data fetches only
> # Obey robots.txt, meta robots tags and rel attributes _(the default)_
> The end user doc needs to be updated.
> Google ressources on robot instructions in HTML pages:
> [0] https://support.google.com/webmasters/answer/79812?hl=en&ctx=cb&src=cb&cbid=tnnsjq5jcodt&cbrank=4
> [1] https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3
> Thread on the mailing list
> [2] https://www.mail-archive.com/user@manifoldcf.apache.org/msg03258.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message