Return-Path: X-Original-To: apmail-incubator-any23-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-any23-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B244A91ED for ; Fri, 1 Jun 2012 12:26:26 +0000 (UTC) Received: (qmail 2465 invoked by uid 500); 1 Jun 2012 12:26:26 -0000 Delivered-To: apmail-incubator-any23-dev-archive@incubator.apache.org Received: (qmail 2410 invoked by uid 500); 1 Jun 2012 12:26:25 -0000 Mailing-List: contact any23-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: any23-dev@incubator.apache.org Delivered-To: mailing list any23-dev@incubator.apache.org Received: (qmail 2324 invoked by uid 99); 1 Jun 2012 12:26:23 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jun 2012 12:26:23 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 4E5F51402B8 for ; Fri, 1 Jun 2012 12:26:23 +0000 (UTC) Date: Fri, 1 Jun 2012 12:26:22 +0000 (UTC) From: "Michele Mostarda (JIRA)" To: any23-dev@incubator.apache.org Message-ID: <1225817918.26381.1338553583323.JavaMail.jiratomcat@issues-vm> In-Reply-To: <200362328.17832.1334248518930.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Assigned] (ANY23-75) Improve runtime of the Microdata extractor on documents with many relations. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ANY23-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Mostarda reassigned ANY23-75: ------------------------------------- Assignee: Michele Mostarda > Improve runtime of the Microdata extractor on documents with many relations. > ---------------------------------------------------------------------------- > > Key: ANY23-75 > URL: https://issues.apache.org/jira/browse/ANY23-75 > Project: Apache Any23 > Issue Type: Improvement > Affects Versions: 0.7.0 > Reporter: Timothy Potter > Assignee: Michele Mostarda > Fix For: 0.7.0 > > Attachments: MicrodataParser.diff > > > I've been running Any23 on a big web crawler dump. I found for certain documents with a lot of Microdata relations the method MicrodataParser.getItemProps() becomes very slow. As a result, processing one document can take several minutes. An example of a problematic page can be seen here: http://dreamtime.fftunes.com/ > I'll attach a patch for the method that greatly improves the performance of this method. I was wondering if someone could have a look at it and include it in the next release if possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira