lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject RE: [Performance] Streaming main memory indexing of single strings
Date Fri, 15 Apr 2005 23:26:04 GMT
I think one of the advantages may be the analyzers and processors that are
already available for several documents types.

Using regex with these is nearly impossible.

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org]
Sent: Friday, April 15, 2005 6:16 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings


Wolfgang Hoschek wrote:
> The classic fuzzy fulltext search and similarity matching that Lucene is
> good for :-)

So you need a score that can be compared to other matches?  This will be
based on nothing but term frequency, which a regex can compute.  With a
single document there'll be no IDFs, so you could simply sum sqrt() of
term regex match counts, and divide by the sqrt of the length of the string.

Yes, I'm playing devil's advocate...

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message