lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve FromMoreover" <steve_from_moreo...@hotmail.com>
Subject RE: Can lucene do this?
Date Mon, 15 May 2006 15:35:57 GMT
Hi Scott,

I saw your email and thought of some work I have been doing recently for 
matching text on the fly.  If you are not going to be keeping the emails for 
later searching then this may provide a faster and easier way of checking 
your email.  It involves using monq which is a java api - 
http://www.ebi.ac.uk/Rebholz-srv/whatizit/software

http://www.ebi.ac.uk/~kirsch/JfaWiki/index.php/Main_Page

As your emails come in you can pass them through this software and any 
matched regular expressions will trigger java callbacks.  The best bit is 
that no matter how many regular expressions you have (I tried 30,000) you 
can process 1Mb per second of data, this is because the program just runs 
through the Finite State Automaton that gets created one character at a 
time... (this is demoed graphically here - 
http://www-sr.informatik.uni-tuebingen.de/~buehler/AC/AC.html if you choose 
the Aho/Corasick algorithm)

Anyways, off topic, so if you want more advice I can email you direct and 
I'm sure someone else on the list can help you out with the Lucene-related 
answer...

Ta

steve

-----Original Message-----
From: Scott Smith [mailto:ssmith@mainstreamdata.com]
Sent: 12 May 2006 02:29
To: lucene-user@jakarta.apache.org
Subject: Can lucene do this?

I'm building an application which has to provide "real-time" searching of 
emails as they come in.  I have a number of search strings that I need to 
apply against each email as it comes in and then do something with the email 
based on which search string(s) get a hit.



My initial thought was to create a lucene index of the emails received in 
the last N seconds (where N is around 5 since I don't have to be quite 
real-time) in a memory directory, do my searches and then delete the index 
and create a new index for emails received in the next 5
seconds.   I'm a little concerned because the number of search strings
will probably grow over time and so there is a bit of a scalability 
issue-though I'm not sure there's anyway around that other than doing 
parallel processing on different machines.



I'm wondering if anyone has any experience doing this kind of thing and has 
additional or alternate suggestions??



Scott

_________________________________________________________________
Are you using the latest version of MSN Messenger? Download MSN Messenger 
7.5 today! http://join.msn.com/messenger/overview


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message