lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Analyzers and sorting with a custom analysis chain
Date Sat, 03 Sep 2011 03:53:58 GMT
Hi Yonik,

On Sep 2, 2011, at 7:47 PM, Yonik Seeley wrote:

> On Fri, Sep 2, 2011 at 10:26 PM, Mattmann, Chris A (388J)
> <chris.a.mattmann@jpl.nasa.gov> wrote:
>> I'm left with childrenshospitallosangeles as a single token resultant from the chain.
>> So, when I go to sort the titles in Solr, I use sort=title_sort asc, and I am getting
all kinds of weird results when doing
>> a query.
> 
> Hmmm, a random guess would be that perhaps your analysis chain is
> actually producing more than one token per document.   The lucene
> FieldCache takes the highest for each document (just a non-intended
> side-effect of how the FieldCache entry is populated by enumerating
> terms).
> 
> Try adding fsv=true to your request.  It's an undocumented feature
> used in distributed search (it stands for field sort values) used to
> collate results from different shards.  It should add "sort_values" to
> your response to tell you the sort values for each document.

First off, thanks for the reply. I appreciate it.

I tried the fsv=true parameter and it's great, it revealed what's really 
going on here:

 "sort_values":[
  "title_sort",[null,
	null,
	null,
	null,
....

I've got one of those null values for each returned document. Now I guess
I have to find out what's wrong with my CombiningFilter.

All it does basically is have a static method to call incrementToken() and then 
call TermAttribute.term() for each of the tokens in the stream. It takes these, 
appends them to a StringBuffer (concats them), and then returns a new 
KeywordTokenizer providing a StringReader initialized with the merged 
StringBuffer. Yes, I know this probably isn't the most efficient way and I'm 
open to suggestions.

I think in spelling this out though, I might have elaborated my problem. Since 
the method I call in the constructor for my CombiningFilter is super(mergeStreamTokens(in))
where mergeStreamTokens is a static method, I think I might have consumed the input 
TokenStream by the time it gets called for the sort. It works on analysis.jsp probably 
because the stream isn't re-consumed? Not sure, something wiggy is going on.

I'll keep poking, thanks again.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message