Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 746B918866 for ; Tue, 15 Dec 2015 11:41:01 +0000 (UTC) Received: (qmail 91219 invoked by uid 500); 15 Dec 2015 11:41:01 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 90796 invoked by uid 500); 15 Dec 2015 11:41:01 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 90785 invoked by uid 99); 15 Dec 2015 11:41:00 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2015 11:41:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 851E4C0D43 for ; Tue, 15 Dec 2015 11:41:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.445 X-Spam-Level: ** X-Spam-Status: No, score=2.445 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RP_MATCHES_RCVD=-0.554, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id gTZr19nW7fsI for ; Tue, 15 Dec 2015 11:40:50 +0000 (UTC) Received: from relay-ptn.dstl.gov.uk (relay-ptn.dstl.gov.uk [194.61.92.9]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id A30B9206E3 for ; Tue, 15 Dec 2015 11:40:49 +0000 (UTC) Received: (qmail 27158 invoked from network); 15 Dec 2015 11:40:48 +0000 Received: from unknown (HELO ptnunfw5085877.dstl.gov.uk) (194.61.92.11) by relay-ptn.dstl.gov.uk with SMTP; 15 Dec 2015 11:40:48 +0000 Received: (spamdyke); (encrypted with AES256-SHA); 15 Dec 2015 11:40:46 +0000 Received: from baton-ptn.dstl.gov.uk (baton-ptn.dstl.gov.uk [10.97.10.3]) by ptnunfw5085877.dstl.gov.uk with SMTP id tBFBejkK062305 for ; Tue, 15 Dec 2015 11:40:47 GMT Received: (qmail 7347 invoked from network); 15 Dec 2015 11:40:45 +0000 Received: from unknown (HELO PTNRWSVMAIL994) (10.97.10.1) by 0 with SMTP; 15 Dec 2015 11:40:45 +0000 Received: from PTNRWSVMAIL022.rnet.dstl.gov.uk (Not Verified[10.96.132.209]) by PTNRWSVMAIL994 with Trustwave SEG (v7,3,0,7277) id ; Tue, 15 Dec 2015 11:43:12 +0000 Received: from PTNRWSVMAIL020.rnet.dstl.gov.uk ([169.254.1.197]) by PTNRWSVMAIL022.rnet.dstl.gov.uk ([10.96.132.209]) with mapi; Tue, 15 Dec 2015 11:40:34 +0000 From: Baker James D To: "user@uima.apache.org" Date: Tue, 15 Dec 2015 11:40:33 +0000 Subject: [UK OFFICIAL] Baleen 2.1 Released Thread-Index: AdE3LWiP7gRUONNbTamJR+oTL/sHsw== Message-ID: <532C1D0EC81FCC4AA20E3CCB1D2952F026729E9C75@PTNRWSVMAIL020.rnet.dstl.gov.uk> Accept-Language: en-US, en-GB Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-tituslabs-classifications-30: TLPropertyRoot=MOD;Classification=UK OFFICIAL;Handling Instruction (Optional) - UK OFFICIAL=; x-tituslabs-classificationhash-30: k5/vssDqmf2mveDUEIruN3XjDYtXE5reVp72frdG5rKW4p8PqwARFzWrXJhfhQhVT8TfWpW302XT2ChgDpouLmzdvg6+bDnCKQ3ZZUl8BEZz+E9ioPJc/qOZfkCbJMVN4PuBNZCE+lt1Wcx9LmpKZSJuhsdcHk51j3TW6cOWXAXIXreFD/P9801uQ0MN2HJnLgzrU1aPZFFgo7XnLsyhn6Lskz5G63N+59hUDzZode29v4/oE0iimkqsax+pYX8jUmm8Wz5qaU3xF3MwJaiSCw== x-tituslabs-categorylabel: UK OFFICIAL x-tituslabs-subjectprelabel: [UK OFFICIAL] acceptlanguage: en-US, en-GB Content-Type: multipart/alternative; boundary="_000_532C1D0EC81FCC4AA20E3CCB1D2952F026729E9C75PTNRWSVMAIL02_" MIME-Version: 1.0 --_000_532C1D0EC81FCC4AA20E3CCB1D2952F026729E9C75PTNRWSVMAIL02_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Classification: UK OFFICIAL Morning all, A new version of Baleen, the UIMA based entity extraction and text analyt= ics framework developed by Dstl (part of the UK Ministry of Defence) has = been released. This version includes the following improvements: * New Annotator: MongoStemming uses a gazetteer and stemming to p= erform a pseudo-fuzzy match and find gazetter terms in different tenses a= nd plurals * New Cleaner: MergeAdjacent will merge adjacent entities of the = same type * New Content Extractor: CsvContentExtractor splits CSV fields in= to content and metadata * New Collection Reader: LineReader will read a single file into = multiple documents by line * New REST API to get configuration parameters for components (e.= g. annotators) * Significant changes to the way gazetteer annotators work, inclu= ding changing from RadixTrees to MultiMaps and implementing the Aho-Coras= ick algorithm, resulting in performance improvements for large gazetteers= =20in the order of 100s * Lots of bug fixes and minor improvements The latest release is available on GitHub: https://github.com/dstl/baleen= Any feedback, suggestions, comments, issues and code contributions are we= lcome! We're keen for people to help us improve it so that it's a useful = tool for a wide range of people. James "This e-mail and any attachment(s) is intended for the recipient only. = Its unauthorised use,=20 disclosure, storage or copying is not permitted. Communications with Dst= l are monitored and/or=20 recorded for system efficiency and other lawful purposes, including busin= ess intelligence, business=20 metrics and training. Any views or opinions expressed in this e-mail do = not necessarily reflect Dstl policy." "If you are not the intended recipient, please remove it from your system= =20and notify the author of=20 the email and centralenq@dstl.gov.uk" --_000_532C1D0EC81FCC4AA20E3CCB1D2952F026729E9C75PTNRWSVMAIL02_--