Return-Path: X-Original-To: apmail-incubator-opennlp-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-opennlp-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9864F6BBE for ; Tue, 14 Jun 2011 02:33:06 +0000 (UTC) Received: (qmail 83398 invoked by uid 500); 14 Jun 2011 02:33:06 -0000 Delivered-To: apmail-incubator-opennlp-dev-archive@incubator.apache.org Received: (qmail 83335 invoked by uid 500); 14 Jun 2011 02:33:05 -0000 Mailing-List: contact opennlp-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: opennlp-dev@incubator.apache.org Delivered-To: mailing list opennlp-dev@incubator.apache.org Received: (qmail 83327 invoked by uid 99); 14 Jun 2011 02:33:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2011 02:33:05 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of james.kosin@gmail.com designates 74.125.83.175 as permitted sender) Received: from [74.125.83.175] (HELO mail-pv0-f175.google.com) (74.125.83.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2011 02:32:57 +0000 Received: by pvc30 with SMTP id 30so2525340pvc.6 for ; Mon, 13 Jun 2011 19:32:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=bc2ugCpFoapO4AxqIeqZxpl5sid2Zrkbn4wb+F7jYcM=; b=A8eDV0a60XqTiNxQkxlXaQxJtW1ZwlJ/eLshcIFiykr+KIxJ4AHSGL4Bw1wjw42KV/ 9fxAix37qEi8gr1uvkVtnYorLyQ13uqa8zO7JIeMz9pwOdIru2JPd+Mi6QXmYC9noPuy 3VEULI0D6hO0FxwsQBe5SfxUsLKzYDZ6U2sDM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=G/JTw2+y6zsFGNzBwEHJnpToAGLz0C+Iywt/maCficbKZAeGOIDtBo8eSDwHEQJ8/o NithCSlMqmYUv23ToWBIVVLA0HarXH0y3Fqlr7fPj3WEj/NyTiOyG0CkujB2Azg3+OXG PFbPW5NHN2X9EVE2Msvpq/NuI+9fdIY9sFRxw= Received: by 10.142.117.5 with SMTP id p5mr693771wfc.245.1308018757065; Mon, 13 Jun 2011 19:32:37 -0700 (PDT) Received: from [192.168.159.3] (ip98-166-147-181.hr.hr.cox.net [98.166.147.181]) by mx.google.com with ESMTPS id l10sm6600710wfk.9.2011.06.13.19.32.35 (version=SSLv3 cipher=OTHER); Mon, 13 Jun 2011 19:32:36 -0700 (PDT) Message-ID: <4DF6C844.4020807@gmail.com> Date: Mon, 13 Jun 2011 22:32:36 -0400 From: James Kosin User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: opennlp-dev@incubator.apache.org Subject: Re: Custom feature generators References: In-Reply-To: X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 6/13/2011 10:23 PM, william.colen@gmail.com wrote: > Hi, > > Currently we only have implemented custom feature generators that we can > pass from command line only for NameFinder, but it would be very nice to > have it for all tools. > The Thai sentence detector customization is nice and simple, but to do > something for other languages the user would need to branch the code. We > should allow users to pass a factory class name from command line. Maybe we > could do it for every tool that doesn't use sequence feature generator. Also > would be nice to save the factory class name to the model to make sure we > are using the same feature generator during runtime and evaluation. > > What do you think? Maybe you have thought a better solution for that. > > Thanks > William > William, We discussed various options, unfortunately, most involved some security risk for the Java engine; including allowing the saving of the actual feature generator constructor itself to the model. Maybe the XML option may be a better route for the long run. We could even save the copy of the XML document in the model itself. But again that opens us up for issues if someone writes bad XML to cause issues. Maybe, we could have the feature generator a generic class that needed a constructor. Then each implementing language could have a new constructor that correctly built the feature generator. Unfortunately, it means a change would break any models. We may need to re-open the issue when Jorn comes back or at least get another discussion going so we can try and weed out the issues with the options available. James