incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Potter (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ANY23-75) Improve runtime of the Microdata extractor on documents with many relations.
Date Thu, 12 Apr 2012 16:35:18 GMT
Improve runtime of the Microdata extractor on documents with many relations.
----------------------------------------------------------------------------

                 Key: ANY23-75
                 URL: https://issues.apache.org/jira/browse/ANY23-75
             Project: Apache Any23
          Issue Type: Improvement
            Reporter: Timothy Potter
            Priority: Minor


I've been running Any23 on a big web crawler dump.  I found for certain documents with a lot
of Microdata relations the method MicrodataParser.getItemProps() becomes very slow. As a result,
processing one document can take several minutes.   An example of a problematic page can be
seen here: http://dreamtime.fftunes.com/

I'll attach a patch for the method that greatly improves the performance of this method. 
I was wondering if someone could have a look at it and include it in the next release if possible.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message