manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1392) Add option for Web connector to ignore robots instructions in meta tags
Date Tue, 28 Feb 2017 22:38:45 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888994#comment-15888994
] 

Karl Wright commented on CONNECTORS-1392:
-----------------------------------------

I've had a look at this patch.  There's a minor formatting problem; the form is 4 columns
wide, and it looks like you've got part of the form being 2 and the other part being 4:

{code}
@@ -3300,8 +3330,12 @@
 "  <tr>\n"+
 "    <td class=\"description\" colspan=\"1\"><nobr>"+Messages.getBodyString(locale,"WebcrawlerConnector.EmailAddress")+"</nobr></td>\n"+
 "    <td class=\"value\" colspan=\"1\">"+Encoder.bodyEscape(email)+"</td>\n"+
+"  </tr>\n"+
+"  <tr>\n"+
 "    <td class=\"description\" colspan=\"1\"><nobr>"+Messages.getBodyString(locale,"WebcrawlerConnector.RobotsUsage")+"</nobr></td>\n"+
 "    <td class=\"value\" colspan=\"1\"><nobr>"+Encoder.bodyEscape(robots)+"</nobr></td>\n"+
+"    <td class=\"description\" colspan=\"1\"><nobr>"+Messages.getBodyString(locale,"WebcrawlerConnector.MetaRobotsTagsUsage")+"</nobr></td>\n"+
+"    <td class=\"value\" colspan=\"1\">"+Encoder.bodyEscape(metaRobotsTagsUsage)+"</td>\n"+
 "  </tr>\n"+
 "  <tr>\n"+
 "    <td class=\"description\"><nobr>" + Messages.getBodyString(locale,"WebcrawlerConnector.ProxyHostColon")
+ "</nobr></td>\n"+
{code}




> Add option for Web connector to ignore robots instructions in meta tags
> -----------------------------------------------------------------------
>
>                 Key: CONNECTORS-1392
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1392
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Web connector
>            Reporter: Markus Schuch
>         Attachments: CONNECTORS-1392.patch
>
>
> The Web connectors already allows to ignore robots.txt by option.
> With this ticket, another option is added, to allow the connector to ignore robots instructions
in {{<meta name="robots ...}} tags.
> *Proposal (to be discussed)*
> Add a new option list "Page level robots instructions" to the "Robots" Tab. List entries:
> # Obey meta robots tags (the default)
> # Don't took at meta robots tags
> The end user doc needs to be updated.
> Google ressources on robot instructions in HTML pages:
> [0] https://support.google.com/webmasters/answer/79812?hl=en&ctx=cb&src=cb&cbid=tnnsjq5jcodt&cbrank=4
> [1] https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3
> [2] https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag?csw=1
> Thread on the mailing list
> [3] https://www.mail-archive.com/user@manifoldcf.apache.org/msg03258.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message