lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karsten Konrad" <Karsten.Kon...@xtramind.com>
Subject AW: AW: AW: N-gram layer and language guessing
Date Tue, 03 Feb 2004 13:35:52 GMT

>>

I didn't get your point here. Are you pro or anti-ngrams?
>>

I am very pro, but we are currently not using ngrams for
indexing larger texts, as we expect problems from the
order of magnitude more tokens to index - but we did 
not test this, due to time constraints in our 
development.

So, if you do ngrams for searching, and some measurements
how the greater number of ngrams have effect on search
speed, that would be nice.

>>
If I stem the query and then stem information in the index 
in realtime, stemming won't take up any extra space? Or?
>>

Stemming vs ngrams is a topic of its own. Stemmers
usually are fast and do not need space on disc. But 
then, for some languages it is hard to write good 
stemmers, and stemmers can't handle spelling errors. 
Ngrams work for all languages, can handle spelling
variations, umlaute, errors, mixed language documents
etc. 

Have fun with ngram,

Karsten


-----Urspr√ľngliche Nachricht-----
Von: karl wettin [mailto:kalle@snigel.dnsalias.net] 
Gesendet: Dienstag, 3. Februar 2004 14:01
An: Lucene Developers List
Betreff: Re: AW: AW: N-gram layer and language guessing


On Tue, 3 Feb 2004 13:36:35 +0100
"Karsten Konrad" <Karsten.Konrad@xtramind.com> wrote:

> 
> If you use ngrams consistently, you can leave out stemming and spend 
> your time with something different (like buing a bigger harddisc for 
> your indexes, you probably will need them then :)

I didn't get your point here. Are you pro or anti-ngrams?

If I stem the query and then stem information in the index in realtime, stemming won't take
up any extra space? Or?

I'm quite green when it comes to indexes. It's all trie-patterns to me.



kalle

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message