lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <whosc...@lbl.gov>
Subject Re: [Performance] Streaming main memory indexing of single strings
Date Fri, 15 Apr 2005 23:32:56 GMT
On Apr 15, 2005, at 4:15 PM, Doug Cutting wrote:

> Wolfgang Hoschek wrote:
>> The classic fuzzy fulltext search and similarity matching that Lucene 
>> is good for :-)
>
> So you need a score that can be compared to other matches?  This will 
> be based on nothing but term frequency, which a regex can compute.  
> With a single document there'll be no IDFs, so you could simply sum 
> sqrt() of term regex match counts, and divide by the sqrt of the 
> length of the string.

Is there a function f that can translate any lucene query (with all its 
syntax and fuzzy features) to a regex? E.g. how to translate 
StandardAnalyzer or stemming into a regex? If so, yes, but that seems 
unlikely, no?

My particular interest is to use XQuery for *precisely* locating 
information subsets in networked XML messages, and then to use Lucene's 
fulltext functionality for *fuzzy* searches within such a precise 
subset. Messages are classified and routed/forwarded accordingly. See 
http://dsd.lbl.gov/nux/ for background. [BTW, XQuery already has 
regexes built-in].

>
> Yes, I'm playing devil's advocate...

Always a good thing to check assumptions :-)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message