lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: SpanRegex speed
Date Fri, 01 Sep 2006 14:28:52 GMT
Erick Erickson wrote:
> OK, a not very helpful answer, but "of course they're slower, they do 
> more
> work" (the span versions). But that's fairly useless, since the 
> question is
> really "is it enough slower in my situation that I need to find an
> alternative?". And the only way I know of to answer that question is 
> to make
> some tests with the data representing my particular problem......
>
> Sorry I can't be more help....
> Erick
>
> On 9/1/06, Mark Miller <markrmiller@gmail.com> wrote:
>>
>> Erick Erickson wrote:
>> > Let me chime in here on a different note.... before you get happy with
>> > wildcard queries, take a look at the thread "I just don't get
>> > wildcards at
>> > all". There is lots of good info that Erik, Chris and Otis provided 
>> me.
>> >
>> > The danger with prefixquery and wildcard query is that they will throw
>> > TooManyClauses exceptions when you start matching a number of terms 
>> (the
>> > default is 1024, although you can make this much bigger if memory
>> > allows).
>> > If you're aware of this and it is and will be OK in your app, ignore
>> > this.
>> > But if your index is going to grow significantly, this is a real
>> > problem. I
>> > went with implementing filters with WildCardTermEnum (you could 
>> also use
>> > RegexTermEnum) for the wildcard portions of my query. Which has
>> > interesting
>> > implications for spans, we elected to say spans didn't work with
>> > wildcards.
>> >
>> > Anyway, as I said, if you're aware of the TooManyClauses issue and are
>> > sure
>> > it doesn't matter, ignore me. After all, everybody else does <G>.....
>> >
>> >
>> > Best
>> > Erick
>> >
>> >
>> >
>> > On 8/30/06, Mark Miller <markrmiller@gmail.com> wrote:
>> >>
>> >> Ignore that last question. I see that you said prefix wildcard query
>> and
>> >> not wildcard query. A quick look at the code seems to show it 
>> grabbing
>> a
>> >> prefix as well.
>> >>
>> >> Do you think one would be any faster than the other? Should I used
>> >> Wildcardqueries outside of spanqueries and the regexquery inside
>> >> spanqueries or use regex both places?
>> >>
>> >> - Mark
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>> Thanks a lot for the info Eric. Good stuff to know for sure.
>> I guess the real question I have been trying to spit out is this:
>> Is a span version of any of these searches--fuzzy, wildcard,
>> etc--inherently slower than their non-span brothers. If they have the
>> same limitations and speeds then that is all I am looking for.
>>
>> P.S.
>> I realize I have been screwing up the threading by replying when
>> starting a new topic. I have been alerted and will stop this pernicious
>> activity.
>>
>> - Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
Thanks Eric. Your always more than helpful. The reason I only care that 
they are as good as they can be is that I am looking for a general 
solution and not one tailored to a particular dataset. This is for a 
general query parser. I want to be able to search for wildcard, fuzzies, 
etc in a proximity search. mark*off NEAR Bork?on. This may just be a 
slow query in general but other search engines appear to offer this, and 
they must face similar limitations. So if  a fuzzy search is slow in a 
proximity search just because it is slow...I don't mind. If it is slow  
because lucene implements spans in a way that makes wildcard and fuzzies 
particularly slow in them...thats what I would like to know. And if that 
is the case...someone should make a fuzzy and wildcard that is fast in a 
span :)

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message