any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Kutuzov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ANY23-240) Option to process <br>s as spaces in Microdata
Date Wed, 22 Oct 2014 13:23:34 GMT
Andrey Kutuzov created ANY23-240:
------------------------------------

             Summary: Option to process <br>s as spaces in Microdata
                 Key: ANY23-240
                 URL: https://issues.apache.org/jira/browse/ANY23-240
             Project: Apache Any23
          Issue Type: Improvement
          Components: extractors, microdata
            Reporter: Andrey Kutuzov


When extracting Microdata from html pages, any23 silently drops all html tags inside predicates'
values. See, for example, http://schema.org/Recipe/ingredients at http://kuking.net/3_2070.htm.
The problem is that on this page (and many others) ingredients are separated from each other
only with '<br>' tag. After any23 drops it, the content becomes mixed and unintelligible.
At the same time, Google Structured Data Testing Tool separates them properly with spaces.
Is it possible to implement this behavior (replacing <br> tags with spaces) in any23
as option?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message