Return-Path: Delivered-To: apmail-uima-user-archive@www.apache.org Received: (qmail 99844 invoked from network); 27 May 2010 03:56:22 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 May 2010 03:56:22 -0000 Received: (qmail 82027 invoked by uid 500); 27 May 2010 03:56:21 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 82008 invoked by uid 500); 27 May 2010 03:56:21 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 81996 invoked by uid 99); 27 May 2010 03:56:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 May 2010 03:56:21 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [207.97.245.201] (HELO smtp201.iad.emailsrvr.com) (207.97.245.201) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 May 2010 03:56:14 +0000 Received: from relay20.relay.iad.mlsrvr.com (localhost [127.0.0.1]) by relay20.relay.iad.mlsrvr.com (SMTP Server) with ESMTP id D067E1B5963; Wed, 26 May 2010 23:55:53 -0400 (EDT) Received: by relay20.relay.iad.mlsrvr.com (Authenticated sender: anuj.saini-AT-orkash.com) with ESMTPSA id 40D391B41A2 for ; Wed, 26 May 2010 23:55:52 -0400 (EDT) Message-ID: <4BFDEDA8.1080301@orkash.com> Date: Thu, 27 May 2010 09:27:28 +0530 From: Anuj Saini User-Agent: Thunderbird 2.0.0.18 (X11/20081120) MIME-Version: 1.0 To: user@uima.apache.org Subject: Re: What about a search engine like References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit You are trying to generate clusters of similar artifacts. Though this can be done at processing time, but better approach is to keep the annotated results in database. My suggestion is use index for fast retrieval. UIMA does'nt provide anything special to do this, but you can use lucene/solr to achieve it. There is a feature "MoreLikeThis" in lucene/solr which is very handy to find out related articles. cheers Anuj Radwen Aniba wrote: > Hello everyone, > > Well I have a question regarding uima usage. > Till now I used UIMA to annotate documents and that's cooleverything is > great. > Well but now I will probably need to parse lots and lots of scientific > articles and abstract to extract knowledge. > Example : let's say I have a document containing the word "Cancer" I would > like to parse available scientific papers related to Cancer and to attach > this information to the word cancer something like "related articles", and > doing that with relevance score depending on the word occurence in the > document for example. > It is like a search engine but for text. > Well I know that's feasable but I don't know where and how to start. > Shall I have all the article and scientific papers in a kind of database or > something like that ? or is there any special format that UIMA could use as > "literrature" database ? > > Can someone help me figuring this out ? > > Thanks a lot > > Rad > >