Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D8C4418754 for ; Tue, 22 Dec 2015 16:26:21 +0000 (UTC) Received: (qmail 15907 invoked by uid 500); 22 Dec 2015 16:26:21 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 15866 invoked by uid 500); 22 Dec 2015 16:26:21 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 15854 invoked by uid 99); 22 Dec 2015 16:26:21 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2015 16:26:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D446DC110C for ; Tue, 22 Dec 2015 16:26:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.121 X-Spam-Level: X-Spam-Status: No, score=-0.121 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Iq0jnyGfvXM0 for ; Tue, 22 Dec 2015 16:26:20 +0000 (UTC) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id D49F9439AD for ; Tue, 22 Dec 2015 16:26:19 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id sv6so27025587lbb.0 for ; Tue, 22 Dec 2015 08:26:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=w5mCDQ2JGpYEkiNSBdwUfdYOaRow3KqogKRNK+8qTA0=; b=ZFuzgUTeVQ0b6YWQP2C5IsOsbkpKMrXuFbv37uVNfOqXYMes2j/doGxuZQOd5bsADj 3A0lvOYbmubbDO9mGZcELCmosFUlPWPc5ZU3j/ZZr3WkNz3EKd/8mAcqHiy5Jbl5vVIf q6Eo5ClyZL/ayZ4MgukJfP3KoVuZUBnHCC8xeAIz+pJb+wW99ug6YGSVsEBv/6+5v/4j +CjmkSxDej4tiJN20/3EwNxVNj1z5BwNACgRFQTOiASQ6TtTgscUcoQvgebJb/9SCbhc PvGlGh3Akntr0ZQElxQJtUy0Fd3tzKhOSgn3oYbp8IZSlnHJIF66xlqkFEEUM0RMXtjh VIYg== X-Received: by 10.112.129.233 with SMTP id nz9mr8869865lbb.112.1450801573184; Tue, 22 Dec 2015 08:26:13 -0800 (PST) Received: from [192.168.22.20] ([87.104.197.212]) by smtp.gmail.com with ESMTPSA id xt2sm5746361lbb.47.2015.12.22.08.26.12 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 22 Dec 2015 08:26:12 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Very long Ruta stream initialization From: Mario Gazzo In-Reply-To: Date: Tue, 22 Dec 2015 17:26:08 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@uima.apache.org X-Mailer: Apple Mail (2.2104) I got around it by removing the default seeders by specifying an empty = seeders list since we don=E2=80=99t need the MARKUP annotations anymore. I still don=E2=80=99t know why it created so much overhead but it = sometimes seemed to rival the POS tagger in processing time. Anyway, this leads me to the next question. Can I disable the creation = of Ruta basic annotations entirely to save processing overhead and only = apply Ruta rules to other annotation types created by other AEs such as = our own? Cheers Mario > On 21 Dec 2015, at 16:09 , Mario Juric = wrote: >=20 > Hi Peter, >=20 > I noticed that occasionally the initialisation in = RutaEngine::initializeStream can tak very long time. I can=E2=80=99t = really explain them and it seems independent of document length since I = have seen this with even very small XML documents. >=20 > The method seems to spend much time in the DefaultSeeder when creating = MARKUP annotations during subiterator.moveToNext calls (line 89) and = inside Subiterator it seems to be the while loop inside = adjustForStrictForward (line 232), which is inside UIMA core classes. I = haven=E2=80=99t gone into any deeper analysis yet but I first like to = hear whether you have an idea what could be the main cause(s) for this? >=20 > We use Ruta 2.3.1 with UIMA 2.8.1 >=20 >=20 > Cheers > Mario