any23-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: opengraph not being extracted
Date Fri, 25 Jul 2014 01:19:35 GMT
Hi Hadar,

On Thu, Jul 24, 2014 at 3:27 AM, <user-digest-help@any23.apache.org> wrote:

>
> I'm trying to use any23 1.0 to extract opengraph data.
> i'm simply creating the Any23 class and running extract.
> It works fine on schema.org but it doesnt extract og tags.
> Anything special needs to be done?
>
>
> OK I found the issue here. Basically Any23 does recognize the og: markup
within the <meta> tag's as follows

<meta property="fb:app_id" content="192959324047861" />
                        <meta property="og:title" content="Led
Zeppelin" />    <meta property="og:url"
content="http://www.last.fm/music/Led+Zeppelin" />    <meta
property="og:image"
content="http://userserve-ak.last.fm/serve/126/378064.jpg" />

However there is an issue with the way that last.fm actually publish thier
data on to the web.
For example, when I run my Any23 master branch code over the webpage, my
validation reporting notifies me the following

<validationReport><errors>
</errors><ruleActivations><ruleActivation><ruleStr>
missing-opengraph-namespace-rule</ruleStr></ruleActivation></ruleActivations><issues><issue><origin>
[HTML: null]</origin><message>
Missing OpenGraph namespace
declaration.</message></issue></issues></validationReport>

bascially that there is no namespace declared to accompany the og: markup...

The question for Any23 is whether or not we should acknowledge the absence
of the namespace declaration and provide one anyone in an effort to
continue with extraction.

Do you think this would be valueable? If it is then I can write the
implementation and post a patch for you to try out.
Thanks
Lewis

Mime
View raw message