Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 936FD200BA5 for ; Tue, 4 Oct 2016 20:28:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 92108160AC7; Tue, 4 Oct 2016 18:28:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ABD2B160ADC for ; Tue, 4 Oct 2016 20:28:22 +0200 (CEST) Received: (qmail 56115 invoked by uid 500); 4 Oct 2016 18:28:21 -0000 Mailing-List: contact user-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ctakes.apache.org Delivered-To: mailing list user@ctakes.apache.org Received: (qmail 55359 invoked by uid 99); 4 Oct 2016 18:28:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2016 18:28:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id BCA12C1C3E for ; Tue, 4 Oct 2016 18:28:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.679 X-Spam-Level: * X-Spam-Status: No, score=1.679 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id FA0NpzJPLCAa for ; Tue, 4 Oct 2016 18:28:18 +0000 (UTC) Received: from mail-it0-f49.google.com (mail-it0-f49.google.com [209.85.214.49]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 17FC25F24F for ; Tue, 4 Oct 2016 18:28:18 +0000 (UTC) Received: by mail-it0-f49.google.com with SMTP id 189so12259776ity.1 for ; Tue, 04 Oct 2016 11:28:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=W8pl5kjATE1yhih8ZTOc1Z9YRvYOEoUz1ERtPuDwO8s=; b=IG4vDWhtBxNVjjn/o4+lfqnDf6+gtkQNrHRKW8Ns11LznxG6ddE/N4hCJN4UEkkp6Q 5zziddyivnb9xdEOOXfYXDKSUcC+RgUOPbP75XfNDNKGhFaTgLpV5BqJB+naX2+SI5mE 12cnm+OydzNpiyXjrEOq82llJbMCSfhogkxPs/h+k0QF/fUyIJ8JDAbM9E9dh6eCdY0S W5cR1wh+9ffm/JcqLtvbOahxTlR+JQ1uSi9yySngOcAA0DlfeQftuAc7y5AClXfJ4+ft nnx4SasXkBDR9fxAJVHqgBb8949XXwhK0Tn7pZJOIX0kXbLGu8CdJxnuqR0xLfIaoHai uv/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=W8pl5kjATE1yhih8ZTOc1Z9YRvYOEoUz1ERtPuDwO8s=; b=TfVeVOSFj7w4+AY1TQFx6lNzSS/enk4jj9j5uQCmcdXVwDu8rXZ3fSJffKfMFT14Ts HLPPnvg/gcsIeZFeU7ulfkZynn3Z1O/sotMnUQJsApNzLBjTKpgMmyJv++sAql2QsssX n9nSUeylwOI/OdQMGZa5qQCWtStC51bjfstu/UMaSlWZ+VrK8TpkP1071AS5+IQNvx9x cb0JiVyJrqAmcFJzjOazist7gSKGHhOKUv84zl1IXebaxuA+w8zDwBsbsqNnEFFJS9x4 WwOlvrpFtj6SZ/xGvwiikvkMCyJmxuOM/NLrC0gX9tSobcENXf7fvEwG8L2B9MFmegKP AdfA== X-Gm-Message-State: AA6/9Rn0eeyEeq6Fkp3TDdRJJZxmfkr0cXLOwpgR7M6HawMhnSJ/iZyUpf5gGjRdWtbI1yw8zQ5U1XpMDtQ5yg== X-Received: by 10.36.185.72 with SMTP id k8mr5636033iti.7.1475605697454; Tue, 04 Oct 2016 11:28:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.115.81 with HTTP; Tue, 4 Oct 2016 11:28:16 -0700 (PDT) In-Reply-To: References: From: Jessica Glover Date: Tue, 4 Oct 2016 14:28:16 -0400 Message-ID: Subject: Re: Data size of UMLS 2011ab and 2016aa To: user@ctakes.apache.org Content-Type: multipart/alternative; boundary=f403045d99e86ce8c4053e0e3986 archived-at: Tue, 04 Oct 2016 18:28:23 -0000 --f403045d99e86ce8c4053e0e3986 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi S.H., cTAKES does not query the UMLS online version. The cTAKES dictionary is built from a subset of the UMLS Metathesaurus. There are two tools in the cTAKES sandbox, dictionary-gui and dictionarytool that you can use to build a dictionary from the subset of the UMLS Metathesaurus that you installed. (they use the META files, though - not the MySQL database) The tools allow you to choose what TUIs, vocabularies, etc. to include in your dictionary. These choices affect the dictionary's size. I hope that helps, Jessica On Tue, Oct 4, 2016 at 12:36 PM, SH.Chou wrote: > Hi All, > I just started to use cTAKES, and have a question regarding the data > size of UMLS 2011ab (the default dataset in cTAKES) and new 2016aa. > I install 2016aa in MySQL database, the data size is about 14G~, but the > 2011ab in cTAKES is just 2G~. I wondered if cTAKES use UMLS API and submi= t > words to query UMLS online version? > Or cTAKES compressed 2011ab (using HSQL?). > > Thanks, > =E2=80=8BS.H.=E2=80=8B > > > --f403045d99e86ce8c4053e0e3986 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi S.H.,

cTAKES does not query the UMLS= online version. The cTAKES dictionary is built from a subset of the UMLS M= etathesaurus. There are two tools in the cTAKES sandbox, dictionary-gui and= dictionarytool that you can use to build a dictionary from the subset of t= he UMLS Metathesaurus that you installed. (they use the META files, though = - not the MySQL database)=C2=A0
The tools allow you to choose wha= t TUIs, vocabularies, etc. to include in your dictionary. These choices aff= ect the dictionary's size.

I hope that helps,<= /div>
Jessica


=
On Tue, Oct 4, 2016 at 12:36 PM, SH.Chou <cls34= 15@gmail.com> wrote:
Hi All,=C2=A0
=C2=A0 =C2=A0 I just sta= rted to use cTAKES, and have a question regarding the data size of UMLS 201= 1ab (the default dataset in cTAKES) and new 2016aa.=C2=A0
= I install 2016aa in MySQL database, the data size is about 14G~, but the 20= 11ab in cTAKES is just 2G~. I wondered if cTAKES use UMLS API and submit wo= rds to query UMLS online version?
Or cTAKES compressed 201= 1ab (using HSQL?).=C2=A0

Thanks,=C2=A0
=E2=80=8BS.H.=E2=80=8B


--f403045d99e86ce8c4053e0e3986--