any23-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From St├ęphane Corlosquet <scorlosq...@gmail.com>
Subject Re: opengraph not being extracted
Date Sun, 03 Aug 2014 02:18:31 GMT
On Sat, Aug 2, 2014 at 10:17 PM, St├ęphane Corlosquet <scorlosquet@gmail.com>
wrote:

>
>
>
> On Thu, Jul 24, 2014 at 9:19 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> Hi Hadar,
>>
>> On Thu, Jul 24, 2014 at 3:27 AM, <user-digest-help@any23.apache.org>
>> wrote:
>>
>>>
>>> I'm trying to use any23 1.0 to extract opengraph data.
>>> i'm simply creating the Any23 class and running extract.
>>> It works fine on schema.org but it doesnt extract og tags.
>>> Anything special needs to be done?
>>>
>>>
>>> OK I found the issue here. Basically Any23 does recognize the og: markup
>> within the <meta> tag's as follows
>>
>> <meta property="fb:app_id" content="192959324047861" />                   
                    <meta property="og:title" content="Led Zeppelin" />    <meta
property="og:url" content="http://www.last.fm/music/Led+Zeppelin" />    <meta property="og:image"
content="http://userserve-ak.last.fm/serve/126/378064.jpg" />
>>
>> However there is an issue with the way that last.fm actually publish
>> thier data on to the web.
>> For example, when I run my Any23 master branch code over the webpage, my
>> validation reporting notifies me the following
>>
>> <validationReport><errors>
>> </errors><ruleActivations><ruleActivation><ruleStr>
>>
>> missing-opengraph-namespace-rule</ruleStr></ruleActivation></ruleActivations><issues><issue><origin>
>> [HTML: null]</origin><message>
>> Missing OpenGraph namespace
>> declaration.</message></issue></issues></validationReport>
>>
>> bascially that there is no namespace declared to accompany the og:
>> markup...
>>
>> The question for Any23 is whether or not we should acknowledge the
>> absence of the namespace declaration and provide one anyone in an effort to
>> continue with extraction.
>>
>> Do you think this would be valueable? If it is then I can write the
>> implementation and post a patch for you to try out.
>>
>
> No, I think this would be a bad idea because RDFa already provides such
> functionality. The RDFa Core Initial Context
> <http://www.w3.org/2011/rdfa-context/rdfa-1.1> includes og and therefore
> all parsers shoudl recognize it. That means the prefix declaration for og
> can be omitted (that's why semargl and other RDFa parsers have no problem
> extracting data from that page). The problem doesn't come from the RDFa
> parser, but from the HTML parser. I want to make sure you've seen my
> comment in Jira which includes more info:
>
here is the link:
https://issues.apache.org/jira/browse/ANY23-227?focusedCommentId=14083838&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14083838





>
>
>> Thanks
>> Lewis
>>
>
>
>
> --
> Steph.
>



-- 
Steph.

Mime
View raw message