any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kunal P (JIRA)" <j...@apache.org>
Subject [jira] Kunal P shared "ANY23-154: Not able to extract microdata in few test cases" with you
Date Mon, 25 Mar 2013 14:29:15 GMT
    Kunal P just shared ANY23-154 with you
    -------------------------------

    Anyone can please help for the issue..

> Not able to extract microdata in few test cases
> -----------------------------------------------
>
>                 Key: ANY23-154
>                 URL: https://issues.apache.org/jira/browse/ANY23-154
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7.0
>         Environment: Windows 7 32bit
> JDK 1.6.0_38
> Intel Core 2 duo and 4GB RAM
>            Reporter: Kunal P
>             Fix For: 0.9.0
>
>
> we are using ApacheAny23 API for extracting microdata from the given web-page as part
of internal project.
> we have some test cases where api is not able to parse the microdata. 
> www.neeraj.nowfloats.com (The web page is not following schema.org standards strictly)
> I am giving the snippit of the HTML code here.
> <div id="someid" itemprop="offer" itemscope itemtype="http://schema.org/Offer">
>   <div ... ></div>
> </div>
> It clearly shows that given microdata is a child of some parent microdata specification
as it contains itemscope as well as itemprop in the same tag. And the given <div id="someid">
tag has no parent microdata specification.
> The method used for extracting ItemScopes is as follows,
> import org.apache.any23.extractor.microdata.ItemScope;
> import org.apache.any23.extractor.microdata.MicrodataParser;
> import org.apache.any23.extractor.microdata.MicrodataParserReport;
> Document dom = getDomDocument(String html)
> MicrodataParserReport report = MicrodataParser.getMicrodata(dom);
> ItemScope[] items = report.getDetectedItemScopes();
> here, items doesnt contain any ItemScope which has above test case. 
> In such scenario, how can we extract microdata from the page using any23 api.
> Is there any way to relax the criterion of itemprop and itemscope not appearing in the
same tag so that we get the data from the webpage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message