incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ANY23-75) Improve runtime of the Microdata extractor on documents with many relations.
Date Sat, 02 Jun 2012 07:09:22 GMT

    [ https://issues.apache.org/jira/browse/ANY23-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287867#comment-13287867
] 

Hudson commented on ANY23-75:
-----------------------------

Integrated in Any23-trunk #220 (See [https://builds.apache.org/job/Any23-trunk/220/])
    Improved MicrodataParser performances. Related to issue #ANY23-75. (Revision 1345154)

     Result = SUCCESS
mostarda : 
Files : 
* /incubator/any23/trunk/core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java

                
> Improve runtime of the Microdata extractor on documents with many relations.
> ----------------------------------------------------------------------------
>
>                 Key: ANY23-75
>                 URL: https://issues.apache.org/jira/browse/ANY23-75
>             Project: Apache Any23
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Timothy Potter
>            Assignee: Michele Mostarda
>             Fix For: 0.7.0
>
>         Attachments: MicrodataParser.diff
>
>
> I've been running Any23 on a big web crawler dump.  I found for certain documents with
a lot of Microdata relations the method MicrodataParser.getItemProps() becomes very slow.
As a result, processing one document can take several minutes.   An example of a problematic
page can be seen here: http://dreamtime.fftunes.com/
> I'll attach a patch for the method that greatly improves the performance of this method.
 I was wondering if someone could have a look at it and include it in the next release if
possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message