Return-Path: X-Original-To: apmail-opennlp-dev-archive@www.apache.org Delivered-To: apmail-opennlp-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 05DCA11DF3 for ; Sun, 11 May 2014 08:03:35 +0000 (UTC) Received: (qmail 11726 invoked by uid 500); 10 May 2014 23:27:19 -0000 Delivered-To: apmail-opennlp-dev-archive@opennlp.apache.org Received: (qmail 41560 invoked by uid 500); 10 May 2014 23:14:33 -0000 Mailing-List: contact dev-help@opennlp.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@opennlp.apache.org Delivered-To: mailing list dev@opennlp.apache.org Received: (qmail 17390 invoked by uid 99); 10 May 2014 23:04:50 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 May 2014 23:04:50 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of agerri.rodrigo@gmail.com designates 74.125.82.43 as permitted sender) Received: from [74.125.82.43] (HELO mail-wg0-f43.google.com) (74.125.82.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 May 2014 08:31:03 +0000 Received: by mail-wg0-f43.google.com with SMTP id l18so2114850wgh.2 for ; Thu, 08 May 2014 01:30:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=LVPSF80Jpnf1f+Glc8y2DBO37ur/jHvQu2Saeslloxk=; b=0MjbyMzWa3ItSI7NazSvZ/GGZ8T5drHPYJtm8HqzTTphL3XZJgv7TDY/nSRkRrymOU nwxcEf0qSmW3DrVwdTKHXALKVFuYUDpnjupEbzonad12J2ikLtlLqea14jVQ0CltdmqM NBmHHBHgZ+lqpgIbm1TFxOApo594ciMm98O6U9nyBYrmvs39Fcsrr33FTacQtZtUVQ4H A6hD2Ye1rlgZ5Y44cqEbSoy18mF+puvxHZlKrOQ3yEmeIgYOAmvHGADm44u3an3UMC7D 4WSN7H5lkBTNPIm8GnFOc/McvRlpyOgt26W4g+pS8ErichE6UfUCtrz1aA29Wy+tWnop lFFw== X-Received: by 10.180.84.129 with SMTP id z1mr11652299wiy.8.1399537840663; Thu, 08 May 2014 01:30:40 -0700 (PDT) Received: from localhost (u102733.bp.ehu.es. [158.227.69.221]) by mx.google.com with ESMTPSA id 12sm405505wju.48.2014.05.08.01.30.39 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 08 May 2014 01:30:39 -0700 (PDT) Date: Thu, 8 May 2014 10:39:20 +0200 From: Rodrigo Agerri To: dev@opennlp.apache.org Subject: Re: TokenNameFinder and Span probs Message-ID: <20140508083920.GA4323@U102733.bp.ehu.es> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Checked: Checked by ClamAV on apache.org +1 to the second solution too, and to use this solution everywhere where a Span object is returned. Rodrigo On 2014/05/07 at 09:22, Joern Kottmann wrote: > Hello Mark, > > +1 for your second solution. I believe that is much more intuitive than > calling a method afterwards to retrieve the prob for a Span. > it is easier to use because the prob is delivered as part of the result and > no user action is required to obtain it. > > We could use this solution everywhere where a span gets returned. > > J�rn > > > > On Wed, May 7, 2014 at 2:18 AM, Mark G wrote: > > > I am currently working on a project in which we are using NER to to pass > > toponyms into the GeoEntityLinker addon for geotagging and I am passing on > > the locations, entities, and other info into SOLR for indexing. Over the > > years I have noticed that the TokenNameFinder interface does not include > > all the probs() methods that the NameFinderME has, and furthermore the Span > > object does not have a double field for storing a prob for itself. Also > > the sentenceDetector has a method called getSentenceProbabilities rather > > than probs(). > > When I pass the Spans into the GeoEntityLinker/EntityLinker I can't get the > > probs anymore because they are not in the Span objects. I can always extend > > Span and add the field, or keep a 2D array of the probs for each sentence, > > but wanted to see what everyone thinks about > > 1. adding the probs methods to the TokenNameFinder interface > > 2. adding a prob field to Span (a double) > > 3. Having the NameFinder return the prob with each Span so it doesn't have > > to be set after the call to find() using the double[] of probs > > 4. Have the sentencedetectorME return its spans with a prob, add probs() > > method to the SentenceDetector interface, and deprecate the > > getSentenceProbabilities... > > > > Thoughts? > >