any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ANY23-131) Nested Microdata are not extracted
Date Fri, 12 Feb 2016 02:43:18 GMT

    [ https://issues.apache.org/jira/browse/ANY23-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143922#comment-15143922
] 

Lewis John McGibbney commented on ANY23-131:
--------------------------------------------

bq. Any news on this?

Well I logged ANY23-273 as the service at any23.org is failing to extract the content of the
bogus comment element... I realized that once I fixed that one (locally) I had well and truly
opened a can of worms! I've just finished manually stepping through the webpage source and
dealing with exceptions thrown by Any23. The Markup on this webpage is nothing short of hellish!!!
Anyways, I've attached a JSON prettyprint of the extracted structure once everything has been
cleaned. Your right, Any23 is not extracting relationships from nested <span> elements.
We need to reopen this issue and address it for this new use case.
Sorry it took me so bloody long to get around to this.

> Nested Microdata are not extracted
> ----------------------------------
>
>                 Key: ANY23-131
>                 URL: https://issues.apache.org/jira/browse/ANY23-131
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: microdata
>    Affects Versions: 0.7.0
>            Reporter: Sebastien Richard
>            Assignee: Lewis John McGibbney
>             Fix For: 1.2
>
>
> Proposed patch:
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java:
> remove incorrect optim:
> L166
> - return getUnnestedNodes( topLevelItemScopes ); 
> + return topLevelItemScopes;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message