Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6944611614 for ; Fri, 5 Sep 2014 16:29:07 +0000 (UTC) Received: (qmail 21275 invoked by uid 500); 5 Sep 2014 16:29:07 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 21224 invoked by uid 500); 5 Sep 2014 16:29:07 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 21213 invoked by uid 99); 5 Sep 2014 16:29:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Sep 2014 16:29:06 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.192.170] (HELO mail-pd0-f170.google.com) (209.85.192.170) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Sep 2014 16:29:01 +0000 Received: by mail-pd0-f170.google.com with SMTP id r10so16001210pdi.15 for ; Fri, 05 Sep 2014 09:28:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=aAsGTTrLRinLCaNiSPHW/wnXonDPzegkk12QCURUjX0=; b=OjdEmhlhwu9/sN/g/ba3ZV2bzg63A3ZJfJrp6h1Jt3lpWzQZ+0H6Co/3gaLfGA2bBs 32EsGjPJ8y8pLjoqgDcPz5Z0wU0DXIRf451AMytmw/6NnJHjxPqptSgGvEmaw8yFIokk GHviLnMupXgxcN4L05cWJ2EBk2EMIUrTUx1pROPsO4DdTKd3W1K3pwrzddYIQjeFbolB 3yMM4H21mUmeLrAaCfdN5DMuTL2Rd8lQFd7dHWqlTjpOobpLhEmvmANlE81RTjok7VUW nkn12cbRUuu4TNgB1GPH0MKjrmp2kJJGrrDRWeOMzuEa2M2t/ODREDwkXxLbmf/kiaIG THNw== X-Gm-Message-State: ALoCoQkurNvOUecyQhDnKVdv9K58siW1tAgzSOgKDr3xFjJWK4n7nnYhB/51B/mlC+gqTM0oP3z1 X-Received: by 10.66.123.75 with SMTP id ly11mr22992607pab.82.1409934520827; Fri, 05 Sep 2014 09:28:40 -0700 (PDT) Received: from localhost.localdomain (184-155-223-24.cpe.cableone.net. [184.155.223.24]) by mx.google.com with ESMTPSA id ek9sm2151772pdb.55.2014.09.05.09.28.38 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 05 Sep 2014 09:28:39 -0700 (PDT) Message-ID: <5409E4B5.7070407@perfectsearchcorp.com> Date: Fri, 05 Sep 2014 10:28:37 -0600 From: Kim Ebert User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0 MIME-Version: 1.0 To: dev@ctakes.apache.org, Sean.Finan@childrens.harvard.edu Subject: Re: Permutations References: <5408E608.7040602@perfectsearchcorp.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Pei and Sean, Sean, any thoughts about this would be helpful. We also had issues in cTAKES 2.5. Here is the patch for 2.5. Before I got the patch to 3.0 Sean made his changes. === modified file 'src/edu/mayo/bmi/lookup/algorithms/FirstTokenPermutationImpl.java' --- src/edu/mayo/bmi/lookup/algorithms/FirstTokenPermutationImpl.java 2012-11-28 01:56:50 +0000 +++ src/edu/mayo/bmi/lookup/algorithms/FirstTokenPermutationImpl.java 2013-02-06 16:39:37 +0000 @@ -294,14 +294,16 @@ Iterator mdhIterator = mdhSet.iterator(); while (mdhIterator.hasNext()) { - MetaDataHit mdh = (MetaDataHit) mdhIterator.next(); + MetaDataHit mdh = (MetaDataHit) mdhIterator.next(); + + List permutationSorted = (List) ((ArrayList)permutation).clone(); // figure out start and end offsets - Collections.sort(permutation); + Collections.sort(permutationSorted); int startOffset; - if (permutation.size() > 0) + if (permutationSorted.size() > 0) { - int firstIdx = ((Integer) permutation.get(0)).intValue(); + int firstIdx = ((Integer) permutationSorted.get(0)).intValue(); if (firstIdx <= firstTokenIndex.intValue()) { firstIdx--; @@ -322,9 +324,9 @@ } int endOffset; - if (permutation.size() > 0) + if (permutationSorted.size() > 0) { - int lastIdx = ((Integer) permutation.get(permutation.size() - 1)).intValue(); + int lastIdx = ((Integer) permutationSorted.get(permutationSorted.size() - 1)).intValue(); if (lastIdx <= firstTokenIndex.intValue()) { lastIdx--; Kim Ebert 1.801.669.7342 Perfect Search Corp http://www.perfectsearchcorp.com/ On 09/05/2014 10:17 AM, Pei Chen wrote: > Hi Kim, > Thanks for pointing that out. > https://issues.apache.org/jira/browse/CTAKES-310 has been opened for > this. > If you commit the changes, we can see if we can include in the 3.2.1 > patch release. > I was looking at the changelist for this file, and it may look like > some of these optimizations may have been intentional by Sean so he > may have some more insight in this bit of the logic. > > On Thu, Sep 4, 2014 at 6:22 PM, Kim Ebert > wrote: >> Hi All, >> >> I was reviewing the use of permutations, and I noticed that we sorted >> the permutation list before creating the string to do the concept lookup >> with. It also appears that we were sorting the object that was stored in >> the parent list. >> >> I've made a few changes, and now it appears I can discover some >> additional concepts based upon the permutations. >> >> Let me know what you think of the following changes. >> >> Thanks, >> >> Kim >> >> === modified file >> 'ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java' >> --- >> ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java >> 2014-07-31 22:00:48 +0000 >> +++ >> ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java >> 2014-09-04 18:39:59 +0000 >> @@ -210,11 +210,12 @@ >> final List> permutationList = iv_permCacheMap.get( >> permutationIndex ); >> for ( List permutations : permutationList ) { >> // Moved sort and offset calculation from inner (per >> MetaDataHit) iteration 2-21-2013 spf >> - Collections.sort( permutations ); >> + List permutationsSorted = (List) >> ((ArrayList)permutations).clone(); >> + Collections.sort( permutationsSorted ); >> int startOffset = firstWordStartOffset; >> int endOffset = firstWordEndOffset; >> - if ( !permutations.isEmpty() ) { >> - int firstIdx = permutations.get( 0 ); >> + if ( !permutationsSorted.isEmpty() ) { >> + int firstIdx = permutationsSorted.get( 0 ); >> if ( firstIdx <= firstTokenIndex ) { >> firstIdx--; >> } >> @@ -222,7 +223,7 @@ >> if ( firstToken.getStartOffset() < firstWordStartOffset ) { >> startOffset = firstToken.getStartOffset(); >> } >> - int lastIdx = permutations.get( permutations.size() - 1 ); >> + int lastIdx = permutationsSorted.get( >> permutationsSorted.size() - 1 ); >> if ( lastIdx <= firstTokenIndex ) { >> lastIdx--; >> } >> >> >> -- >> Kim Ebert >> 1.801.669.7342 >> Perfect Search Corp >> http://www.perfectsearchcorp.com/ >>