lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: content disappears in the index
Date Mon, 12 Nov 2012 13:25:35 GMT
Hi,

could it be that the issue is tokenization? In your explanation, you write the field is tokenized,
but fields used for sorting should not be tokenized and should be indexed as-is (e.g. as Lucene
4.0 StringField). If you have more than one token/document in the field, the sorting is not
defined.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
> Sent: Monday, November 12, 2012 2:19 PM
> To: java-user@lucene.apache.org
> Subject: content disappears in the index
> 
> Hi list,
> a user reported wrong sorting of our search service running on solr.
> While chasing this issue I traced it back through lucene into the index.
> I have a text field for sorting
> (stored,indexed,tokenized,omitNorms,sortMissingLast)
> and three docs with author names.
> 
> If I trace at org.apache.lucene.document.Document.add(IndexableField)
> while indexing I can see all three author names added as field to each
> documents.
> 
> After searching with *:* for the three docs and doing a sort the sorting is
> wrong because one of the author names is reduced to the first char, all other
> chars are lost.
> 
> So having the authors names (Alexander, Arslanagic, Brennmoen) indexed,
> the result of sorting ascending is (Arslanagic, Alexander, Brennmoen) which
> is wrong.
> But this happens because the author "Arslanagic" is reduced to "a" during
> indexing (???) and if sorted "a" is before "alexander".
> 
> Currently I use 4.0 but have the same issue with 3.6.1.
> 
> Without tracing through tons of code:
> - which is the last breakpoint for debugging to see the docs right before they
> go into the index
> - which is the first breakpoint for debugging to see the docs coming right out
> of the index
> 
> Regards
> Bernd
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message