lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: existing feature for Lucene ?
Date Sat, 06 May 2006 02:55:23 GMT
Hi Romain,

The short answer is: no.
The long answer is: Lucene is a library/toolkit for text indexing and searching.  The functionality
that you described is what your application would have to do, and Lucene would provide only
indexing and searching.

You may also want to look at Nutch, a web-crawling and searching application that is part
of the Lucene project.

Otis

----- Original Message ----
From: Romain Péchard <romain.pechard@gmail.com>
To: java-user@lucene.apache.org
Sent: Friday, May 5, 2006 1:51:40 PM
Subject: existing feature for Lucene ?

Hi,

Can the Lucene software do the following thing with some features : search
selected keywords in selected websites (forums, blogs, and others), store
the URLs where the keywords are found sorted by date and then show them in
tabular form ?

Full description of the feature I'm looking for :

The software has to run at least once a day but the better would be a real
time scanning.

Once the fist scan of a given website done, the next run has to check if the
pages have been updated and then to check if there's any keyword in that new
part. It also needs to return if there's a simple update or if there's a
keyword in it (with tag or colour).

The user front-end would need 2 parts. The 1st to tag the given websites
with the chosen keywords (there would be multiple tags for a given website
and a keywords can be associated with many websites). The 2nd to show the
results in 3 tabular forms:

   1. Website result page: number of time the keywords of a given website
   appears (and the list of the URLs) in table and in pie chart form.
   2. Keywords result page: number of time the keyword appears in all the
   websites associated (and the list of the URLs) in table and in histogram.
   3. Global result page: ratio of the different keywords for all the
   websites (% and number of time) in pie chart and histogram forms.

Best,

--
Romain Péchard




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message