Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C8408228 for ; Sat, 3 Sep 2011 03:54:50 +0000 (UTC) Received: (qmail 94457 invoked by uid 500); 3 Sep 2011 03:54:45 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 93833 invoked by uid 500); 3 Sep 2011 03:54:33 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 93811 invoked by uid 99); 3 Sep 2011 03:54:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Sep 2011 03:54:31 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [128.149.139.109] (HELO mail.jpl.nasa.gov) (128.149.139.109) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Sep 2011 03:54:23 +0000 Received: from mail.jpl.nasa.gov (altvirehtstap02.jpl.nasa.gov [128.149.137.73]) by smtp.jpl.nasa.gov (Switch-3.4.3/Switch-3.4.3) with ESMTP id p833rxP6020072 (using TLSv1/SSLv3 with cipher RC4-MD5 (128 bits) verified NO); Fri, 2 Sep 2011 20:53:59 -0700 Received: from ALTPHYEMBEVSP20.RES.AD.JPL ([128.149.137.82]) by ALTVIREHTSTAP02.RES.AD.JPL ([128.149.137.73]) with mapi; Fri, 2 Sep 2011 20:53:59 -0700 From: "Mattmann, Chris A (388J)" To: "dev@lucene.apache.org" , "yonik@lucidimagination.com" CC: "solr-user@lucene.apache.org" Date: Fri, 2 Sep 2011 20:53:58 -0700 Subject: Re: Analyzers and sorting with a custom analysis chain Thread-Topic: Analyzers and sorting with a custom analysis chain Thread-Index: Acxp7RxbpXvYde8XSV6us7nzzk5jJA== Message-ID: <52B08674-8CF0-4161-ACA4-A8C4596134BC@jpl.nasa.gov> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Source-IP: altvirehtstap02.jpl.nasa.gov [128.149.137.73] X-Source-Sender: chris.a.mattmann@jpl.nasa.gov X-AUTH: Authorized X-Virus-Checked: Checked by ClamAV on apache.org Hi Yonik, On Sep 2, 2011, at 7:47 PM, Yonik Seeley wrote: > On Fri, Sep 2, 2011 at 10:26 PM, Mattmann, Chris A (388J) > wrote: >> I'm left with childrenshospitallosangeles as a single token resultant fr= om the chain. >> So, when I go to sort the titles in Solr, I use sort=3Dtitle_sort asc, a= nd I am getting all kinds of weird results when doing >> a query. >=20 > Hmmm, a random guess would be that perhaps your analysis chain is > actually producing more than one token per document. The lucene > FieldCache takes the highest for each document (just a non-intended > side-effect of how the FieldCache entry is populated by enumerating > terms). >=20 > Try adding fsv=3Dtrue to your request. It's an undocumented feature > used in distributed search (it stands for field sort values) used to > collate results from different shards. It should add "sort_values" to > your response to tell you the sort values for each document. First off, thanks for the reply. I appreciate it. I tried the fsv=3Dtrue parameter and it's great, it revealed what's really= =20 going on here: "sort_values":[ "title_sort",[null, null, null, null, .... I've got one of those null values for each returned document. Now I guess I have to find out what's wrong with my CombiningFilter. All it does basically is have a static method to call incrementToken() and = then=20 call TermAttribute.term() for each of the tokens in the stream. It takes th= ese,=20 appends them to a StringBuffer (concats them), and then returns a new=20 KeywordTokenizer providing a StringReader initialized with the merged=20 StringBuffer. Yes, I know this probably isn't the most efficient way and I'= m=20 open to suggestions. I think in spelling this out though, I might have elaborated my problem. Si= nce=20 the method I call in the constructor for my CombiningFilter is super(mergeS= treamTokens(in)) where mergeStreamTokens is a static method, I think I might have consumed t= he input=20 TokenStream by the time it gets called for the sort. It works on analysis.j= sp probably=20 because the stream isn't re-consumed? Not sure, something wiggy is going on= . I'll keep poking, thanks again. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattmann@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org