uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From buddha <buddha_...@yahoo.com.INVALID>
Subject Re: [UK OFFICIAL] Baleen 2.1 Released
Date Tue, 15 Dec 2015 14:20:25 GMT
Hello Mr. Baker,

Do you have any more supporting information on Baleen?  Perhaps a website? I don’t see it
referenced on Github?


May All Your Sequences Converge

> On Dec 15, 2015, at 3:40 AM, Baker James D <JDBAKER@mail.dstl.gov.uk> wrote:
> Classification: UK OFFICIAL
> Morning all,
> A new version of Baleen, the UIMA based entity extraction and text analytics framework
developed by Dstl (part of the UK Ministry of Defence) has been released. This version includes
the following improvements:
> *         New Annotator: MongoStemming uses a gazetteer and stemming to perform a pseudo-fuzzy
match and find gazetter terms in different tenses and plurals
> *         New Cleaner: MergeAdjacent will merge adjacent entities of the same type
> *         New Content Extractor: CsvContentExtractor splits CSV fields into content and
> *         New Collection Reader: LineReader will read a single file into multiple documents
by line
> *         New REST API to get configuration parameters for components (e.g. annotators)
> *         Significant changes to the way gazetteer annotators work, including changing
from RadixTrees to MultiMaps and implementing the Aho-Corasick algorithm, resulting in performance
improvements for large gazetteers in the order of 100s
> *         Lots of bug fixes and minor improvements
> The latest release is available on GitHub: https://github.com/dstl/baleen
> Any feedback, suggestions, comments, issues and code contributions are welcome! We're
keen for people to help us improve it so that it's a useful tool for a wide range of people.
> James
> "This e-mail and any attachment(s) is intended for the recipient only.   Its unauthorised
> disclosure, storage or copying is not permitted.  Communications with Dstl are monitored
> recorded for system efficiency and other lawful purposes, including business intelligence,
> metrics and training.  Any views or opinions expressed in this e-mail do not necessarily
reflect Dstl policy."
> "If you are not the intended recipient, please remove it from your system and notify
the author of 
> the email and centralenq@dstl.gov.uk"

View raw message