Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1042FF518 for ; Wed, 20 Mar 2013 11:26:35 +0000 (UTC) Received: (qmail 6593 invoked by uid 500); 20 Mar 2013 11:26:34 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 6245 invoked by uid 500); 20 Mar 2013 11:26:34 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 6211 invoked by uid 99); 20 Mar 2013 11:26:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Mar 2013 11:26:32 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.17.115.48] (HELO atl4mhob10.myregisteredsite.com) (209.17.115.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Mar 2013 11:26:27 +0000 Received: from mailpod.hostingplatform.com (mail.networksolutionsemail.com [205.178.146.50]) by atl4mhob10.myregisteredsite.com (8.14.4/8.14.4) with ESMTP id r2KBQ5kj015288 for ; Wed, 20 Mar 2013 07:26:05 -0400 Received: (qmail 27727 invoked by uid 0); 20 Mar 2013 11:26:05 -0000 Received: from unknown (HELO ?10.0.0.6?) (tanenblatt@park-slope.net@68.237.141.204) by 0 with ESMTPA; 20 Mar 2013 11:26:05 -0000 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: ConceptMApper From: Michael Tanenblatt In-Reply-To: <514998D0.5060408@informatik.uni-leipzig.de> Date: Wed, 20 Mar 2013 07:26:04 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <37015975-4DF3-46DA-BABB-A2BAA0070626@park-slope.net> References: <514998D0.5060408@informatik.uni-leipzig.de> To: user@uima.apache.org X-Mailer: Apple Mail (2.1503) X-Virus-Checked: Checked by ClamAV on apache.org I have never seen this issue--under no circumstances should anything = less than the full dictionary entry be matched. The only things I can = think of are either errors in the dictionary, though that's unlikely, or = issues with the tokenizer. Or a bug=85 My guess is that the dictionary = entry, "FC Barcelona" is being tokenized such that only "FC" is = annotated, therefore that is the only part that needs to match. You can = test if it is a tokenization issue by using the sample whitespace = tokenizer that comes with ConceptMapper just to test and see what = results you get. On Mar 20, 2013, at 7:09 AM, Andreas Niekler = wrote: > Hello, >=20 > i try to use the ConceptMapper to annotate Multi Word Units in german. = I > face the problem that all the tokens within the dictionary are matched > somehow like. >=20 > In the dict -> FC Barcelona >=20 > Annotated in a Text "The FC scored today" FC is annotated as DictEntry >=20 > Why does conceptMapper annotate this. Here are my Parameters >=20 > AnalysisEngineDescription mapper =3D > AnalysisEngineFactory.createPrimitiveDescription( > ConceptMapper.class, > ts, > ConceptMapper.PARAM_ANNOTATION_NAME, > "org.apache.uima.conceptMapper.DictTerm", > ConceptMapper.PARAM_ENCLOSINGSPAN, = "enclosingSpan", > ConceptMapper.PARAM_TOKENANNOTATION, = "opennlp.uima.Token", > ConceptMapper.PARAM_ATTRIBUTE_LIST, new String[] = {"canonical"}, > ConceptMapper.PARAM_FEATURE_LIST, new String[] = {"DictCanon"}, =09 > ConceptMapper.PARAM_MATCHEDFEATURE, = "matchedText", > ConceptMapper.PARAM_TOKENIZERDESCRIPTOR, = "TokenizerDE.xml", > //ConceptMapper.PARAM_DATA_BLOCK_FS, = "uima.tcas.DocumentAnnotation", > ConceptMapper.PARAM_DATA_BLOCK_FS, = "opennlp.uima.Sentence", > ConceptMapper.PARAM_SEARCHSTRATEGY, = "ContiguousMatch", > ConceptMapper.PARAM_MATCHEDTOKENSFEATURENAME, = "matchedTokens", > TokenNormalizer.PARAM_CASE_MATCH, "ignoreall"); >=20 > Thank you >=20 > Andreas