any23-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hadar Rottenberg <had...@gmail.com>
Subject Re: opengraph not being extracted
Date Sun, 27 Jul 2014 07:20:36 GMT
I think there's already a rule and a fix for adding opengraph namespace...


On Fri, Jul 25, 2014 at 4:19 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Hadar,
>
> On Thu, Jul 24, 2014 at 3:27 AM, <user-digest-help@any23.apache.org>
> wrote:
>
>>
>> I'm trying to use any23 1.0 to extract opengraph data.
>> i'm simply creating the Any23 class and running extract.
>> It works fine on schema.org but it doesnt extract og tags.
>> Anything special needs to be done?
>>
>>
>> OK I found the issue here. Basically Any23 does recognize the og: markup
> within the <meta> tag's as follows
>
> <meta property="fb:app_id" content="192959324047861" />                       
                <meta property="og:title" content="Led Zeppelin" />    <meta property="og:url"
content="http://www.last.fm/music/Led+Zeppelin" />    <meta property="og:image" content="http://userserve-ak.last.fm/serve/126/378064.jpg"
/>
>
> However there is an issue with the way that last.fm actually publish
> thier data on to the web.
> For example, when I run my Any23 master branch code over the webpage, my
> validation reporting notifies me the following
>
> <validationReport><errors>
> </errors><ruleActivations><ruleActivation><ruleStr>
>
> missing-opengraph-namespace-rule</ruleStr></ruleActivation></ruleActivations><issues><issue><origin>
> [HTML: null]</origin><message>
> Missing OpenGraph namespace
> declaration.</message></issue></issues></validationReport>
>
> bascially that there is no namespace declared to accompany the og:
> markup...
>
> The question for Any23 is whether or not we should acknowledge the absence
> of the namespace declaration and provide one anyone in an effort to
> continue with extraction.
>
> Do you think this would be valueable? If it is then I can write the
> implementation and post a patch for you to try out.
> Thanks
> Lewis
>



-- 
Hadar Rottenberg
050-7319093

Mime
View raw message