Return-Path: X-Original-To: apmail-any23-dev-archive@www.apache.org Delivered-To: apmail-any23-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D826179B9 for ; Wed, 22 Oct 2014 13:24:34 +0000 (UTC) Received: (qmail 36650 invoked by uid 500); 22 Oct 2014 13:24:34 -0000 Delivered-To: apmail-any23-dev-archive@any23.apache.org Received: (qmail 36617 invoked by uid 500); 22 Oct 2014 13:24:34 -0000 Mailing-List: contact dev-help@any23.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@any23.apache.org Delivered-To: mailing list dev@any23.apache.org Received: (qmail 36542 invoked by uid 99); 22 Oct 2014 13:24:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Oct 2014 13:24:34 +0000 Date: Wed, 22 Oct 2014 13:24:33 +0000 (UTC) From: "Andrey Kutuzov (JIRA)" To: dev@any23.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ANY23-240) Option to process html tags as spaces in Microdata MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ANY23-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kutuzov updated ANY23-240: --------------------------------- Description: When extracting Microdata from html pages, any23 silently drops all html tags inside predicates' values. See, for example, http://schema.org/Recipe/ingredients at http://kuking.net/3_2070.htm. The problem is that on this page (and many others) ingredients are separated from each other only with '
' tag. After any23 drops it, the content becomes mixed and unintelligible. At the same time, Google Structured Data Testing Tool separates them properly with spaces. Is it possible to implement this behavior (replacing
tags with spaces) in any23 as an option? was: When extracting Microdata from html pages, any23 silently drops all html tags inside predicates' values. See, for example, http://schema.org/Recipe/ingredients at http://kuking.net/3_2070.htm. The problem is that on this page (and many others) ingredients are separated from each other only with '
' tag. After any23 drops it, the content becomes mixed and unintelligible. At the same time, Google Structured Data Testing Tool separates them properly with spaces. Is it possible to implement this behavior (replacing
tags with spaces) in any23 as option? > Option to process html tags as spaces in Microdata > -------------------------------------------------- > > Key: ANY23-240 > URL: https://issues.apache.org/jira/browse/ANY23-240 > Project: Apache Any23 > Issue Type: Improvement > Components: extractors, microdata > Reporter: Andrey Kutuzov > > When extracting Microdata from html pages, any23 silently drops all html tags inside predicates' values. See, for example, http://schema.org/Recipe/ingredients at http://kuking.net/3_2070.htm. > The problem is that on this page (and many others) ingredients are separated from each other only with '
' tag. After any23 drops it, the content becomes mixed and unintelligible. At the same time, Google Structured Data Testing Tool separates them properly with spaces. > Is it possible to implement this behavior (replacing
tags with spaces) in any23 as an option? -- This message was sent by Atlassian JIRA (v6.3.4#6332)