lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane James Vaucher <>
Subject Re: suitability of lucene for project
Date Tue, 13 Apr 2004 03:15:48 GMT
It could be part of you solution, but I don't think so. Let me explain:

I've done this a few times something similar to what you describe. I use 
often use HttpUnit to get information. How you process it, it's up 
to you. If you want it to be indexed (searchable), you can use Lucene. If 
you want to extract structured (or semi-structured) information, use 
wrapper induction techniques (not Lucene).


On 13 Apr 2004, Sebastian Ho wrote:

> hi all
> i am investigating technologies to use for a project which basically
> retrieves html pages on a regular basis(or whenever there are changes)
> and allow html parsing to extract specific information, and presenting
> them as links in a webpage. Note that this is not a general search
> engine kind of project but we are extracting clinical information from
> various website and consolidating them.
> Pls advise me whether Lucene can do the above and in areas where it
> cannot, suggestions to solutions will be appreciated.
> Thanks
> Sebastian Ho
> Bioinformatics Institute
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message