Mailing-List: contact user-help@uima.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@uima.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Message-ID: <4BFDEDA8.1080301@orkash.com>
Date: Thu, 27 May 2010 09:27:28 +0530
From: Anuj Saini <anuj.saini@orkash.com>
User-Agent: Thunderbird 2.0.0.18 (X11/20081120)
MIME-Version: 1.0
To: user@uima.apache.org
Subject: Re: What about a search engine like
References: <AANLkTil5v262J8824UBSj7nEPSrBcc5exWlWMAM-y_na@mail.gmail.com>
In-Reply-To: <AANLkTil5v262J8824UBSj7nEPSrBcc5exWlWMAM-y_na@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

You are trying to generate clusters of similar artifacts. Though this 
can be done at processing time, but better approach is to keep the 
annotated results in database. My suggestion is use index for fast 
retrieval.

UIMA does'nt provide anything special to do this, but you can use 
lucene/solr to achieve it. There is a feature "MoreLikeThis" in 
lucene/solr which is very handy to find out related articles.

cheers
Anuj
Radwen Aniba wrote:
> Hello everyone,
>
> Well I have a question regarding uima usage.
> Till now I used UIMA to annotate documents and that's cooleverything is
> great.
> Well but now I will probably need to parse lots and lots of scientific
> articles and abstract to extract knowledge.
> Example : let's say I have a document containing the word "Cancer" I would
> like to parse available scientific papers related to Cancer and to attach
> this information to the word cancer something like "related articles", and
> doing that with relevance score depending on the word occurence in the
> document for example.
> It is like a search engine but for text.
> Well I know that's feasable but I don't know where and how to start.
> Shall I have all the article and scientific papers in a kind of database or
> something like that ? or is there any special format that UIMA could use as
> "literrature" database ?
>
> Can someone help me figuring this out ?
>
> Thanks a lot
>
> Rad
>
>