any23-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hadar Rottenberg <had...@gmail.com>
Subject Re: opengraph not being extracted
Date Sun, 27 Jul 2014 10:42:04 GMT
I tried to find out where the OGP vocab is referenced, but I couldn't find
anything except in a test class.
how would a new vocab be added to any23?
Hadar


On Sun, Jul 27, 2014 at 10:20 AM, Hadar Rottenberg <hadarr@gmail.com> wrote:

> I think there's already a rule and a fix for adding opengraph namespace...
>
>
> On Fri, Jul 25, 2014 at 4:19 AM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> Hi Hadar,
>>
>> On Thu, Jul 24, 2014 at 3:27 AM, <user-digest-help@any23.apache.org>
>> wrote:
>>
>>>
>>> I'm trying to use any23 1.0 to extract opengraph data.
>>> i'm simply creating the Any23 class and running extract.
>>> It works fine on schema.org but it doesnt extract og tags.
>>> Anything special needs to be done?
>>>
>>>
>>> OK I found the issue here. Basically Any23 does recognize the og: markup
>> within the <meta> tag's as follows
>>
>> <meta property="fb:app_id" content="192959324047861" />                   
                    <meta property="og:title" content="Led Zeppelin" />    <meta
property="og:url" content="http://www.last.fm/music/Led+Zeppelin" />    <meta property="og:image"
content="http://userserve-ak.last.fm/serve/126/378064.jpg" />
>>
>> However there is an issue with the way that last.fm actually publish
>> thier data on to the web.
>> For example, when I run my Any23 master branch code over the webpage, my
>> validation reporting notifies me the following
>>
>> <validationReport><errors>
>> </errors><ruleActivations><ruleActivation><ruleStr>
>>
>> missing-opengraph-namespace-rule</ruleStr></ruleActivation></ruleActivations><issues><issue><origin>
>> [HTML: null]</origin><message>
>> Missing OpenGraph namespace
>> declaration.</message></issue></issues></validationReport>
>>
>> bascially that there is no namespace declared to accompany the og:
>> markup...
>>
>> The question for Any23 is whether or not we should acknowledge the
>> absence of the namespace declaration and provide one anyone in an effort to
>> continue with extraction.
>>
>> Do you think this would be valueable? If it is then I can write the
>> implementation and post a patch for you to try out.
>> Thanks
>> Lewis
>>
>
>
>
> --
> Hadar Rottenberg
> 050-7319093
>



-- 
Hadar Rottenberg
050-7319093

Mime
View raw message