lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Ho <sebasti...@bii.a-star.edu.sg>
Subject Re: suitability of lucene for project
Date Thu, 15 Apr 2004 05:12:41 GMT
I will be searching webpages (url given by user) for keyword (in
clinical record). Will that be structured or unstructured? The records
might be in a table or a list of urls pointing to individual record
webpages.

thks

sebastian


On Tue, 2004-04-13 at 11:15, Stephane James Vaucher wrote:
> It could be part of you solution, but I don't think so. Let me explain:
> 
> I've done this a few times something similar to what you describe. I use 
> often use HttpUnit to get information. How you process it, it's up 
> to you. If you want it to be indexed (searchable), you can use Lucene. If 
> you want to extract structured (or semi-structured) information, use 
> wrapper induction techniques (not Lucene).
> 
> cheers,
> sv
> 
> On 13 Apr 2004, Sebastian Ho wrote:
> 
> > hi all
> > 
> > i am investigating technologies to use for a project which basically
> > retrieves html pages on a regular basis(or whenever there are changes)
> > and allow html parsing to extract specific information, and presenting
> > them as links in a webpage. Note that this is not a general search
> > engine kind of project but we are extracting clinical information from
> > various website and consolidating them.
> > 
> > Pls advise me whether Lucene can do the above and in areas where it
> > cannot, suggestions to solutions will be appreciated.
> > 
> > Thanks
> > 
> > Sebastian Ho
> > Bioinformatics Institute
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message