lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Special characters prevent entity being indexed
Date Tue, 18 Nov 2008 16:27:02 GMT
What analyzer are you using at index and search time? Typical problems
include:
using an analyzer that doesn't understand accented chars (StandardAnalyzer
for instance)
using a different anlyzer during search and index.

Search the user list for "accent" and you'll find this kind of problem
discussed,
and if that doesn't help we need to know what analyzers you are using and
what behavior you really want. Typically, for instance, *requiring* a user
to
type the upside-down exclamation point to get a match on this field would
be considered incorrect.

Also, you'd be helped a lot be getting a copy of Luke and examining your
index
to see exactly what's been indexed, it'll reveal a lot.

Best
Erick

On Tue, Nov 18, 2008 at 10:05 AM, Pekka Nykyri <pnykyri@cs.joensuu.fi>wrote:

> Hi!
>
> I'm having problems with entities including special characters (Spanish
> language) not getting indexed.
>
> I haven't been able to find the the reason why some entities get indexed
> while some don't.
>
> I have 3 fields that (currently) hold the same value. The value for the
> fields is example "¡Fantástico!- blaaba". Then when I change ONE of the
> three values to "¡Fantástico! - blaaba", the entity gets indexed. So
> chanching only one field makes it to index.
>
> But the bigger problem with this is, that I have almost (other fields are
> almost similar and I don't think they cause the problem) similar entity,
> with exactly the same three "¡Fantástico!- blaaba" -fields and it gets
> indexed normally. Even though the "critical" fields are exactly the same.
>
> And also all entities where three fields start with "upside down ?"-mark
> doesn't get indexed.
>
> I'm really confused with the problem because I don't seem to be able to
> find any logic some entities not being indexed even though they are similar
> to some other. And changing only one value of the three makes it index.
>
> Sorry for a really messy message but I just can't explain it more clearly
> now.
>
> Thanks in advance,
> pn
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message