lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Fenbers <mark.fenb...@noaa.gov>
Subject File-based Spelling
Date Mon, 12 Oct 2015 19:37:56 GMT
Greetings!

I'm attempting to use a file-based spell checker.  My sourceLocation is 
/usr/share/dict/linux.words, and my spellcheckIndexDir is set to 
./data/spFile.  BuildOnStartup is set to true, and I see nothing to 
suggest any sort of problem/error in solr.log.  However, in my 
./data/spFile/ directory, there are only two files: segments_2 with only 
71 bytes in it, and a zero-byte write.lock file.  For a source 
dictionary having 480,000 words in it, I was expecting a bit more 
substance in the ./data/spFile directory.  Something doesn't seem right 
with this.

Moreover, I ran a query on the word Fenbers, which isn't listed in the 
linux.words file, but there are several similar words.  The results I 
got back were odd, and suggestions included the following:
fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

But I expected suggestions like fenders, embers, and fenberry, etc. I 
also ran a query on Mark (which IS listed in linux.words) and got back 
two suggestions in a similar format.  I played with configurables like 
changing the fieldType from text_en to string and the characterEncoding 
from UTF-8 to ASCII, etc., but nothing seemed to yield any different 
results.

Can anyone offer suggestions as to what I'm doing wrong?  I've been 
struggling with this for more than 40 hours now!  I'm surprised my 
persistence has lasted this long!

Thanks,
Mark

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message