lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Auto Slop
Date Fri, 06 Jul 2007 14:16:11 GMT
FYI: Solr has a nice Analysis debugging tool that lets you see the  
results of running an analysis as it passes through each phase of the  
Analyzer.  Some enterprising soul might want to make a contribution  
along these lines that could be added to the contrib.  :-)


On Jul 3, 2007, at 5:02 PM, Walt Stoneburner wrote:

> I've solved the problem, thanks to tips from Mark Miller and Ard
> Schrijvers, and am simply recording it so that someone else walking
> through the archives might get some benefit.
> A while ago I had been working on a case-sensitive version of Lucene,
> where with a prefix symbol, it was possible to indicate whether you
> were interested in the token as-is, or the case insensitive version.
> (For instance 'LET' is a real acronym, whereas 'let' is a stop word.)
> I did this by writing a filter that injected two tokens for every one.
> The problem was, that token.setPositionIncrement(0) wasn't being
> called on the duplicate token.  As such, it did appear that there were
> intermediate tokens between the real ones, when they should have
> occupied the same logical position.
> Adding that simple call made phrases work perfectly, even in the light
> of all the modified functionality I've added.  Gotta love Lucene!
> What would have been a helpful debugging tool, would have been if Luke
> was capable of dumping the token positions.  Clearly it can
> reconstruct a document in this manner, but it would have been extra
> nice to see numerical positions they resided at, not just the sequence
> of tokens.
> Hope this helps someone else in the future,
> -Walt Stoneburner,
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message