lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Indexing accented characters, then searching by any form
Date Mon, 11 Feb 2008 18:51:57 GMT
See below...

On Feb 11, 2008 12:17 PM, Cesar Ronchese <> wrote:

> Hey, Erick. You inferred right.
> I analized your code and it looks like a common Indexing and Searching
> code.
> Are you sure you pasted the correct code? :P

Did you try to run it? It's just a self-contained example showing that
and displaying are distinct.

The indexer part indexes a mixed-case string. The search is then
performed on a lower-case string, and the println shows that a
document was found. The next println echoes back the stored text
showing that the original was stored. Just substitute your preferred
filter to see how this would work for you.

> Anyways, is the concept about doubling storing data, one content with
> accents and other without? If yes, I did it earlier, but once I search in
> the non-accent content and show accent content, the HitHighlighter will
> now
> work properly.
> --

Is this a typo or is your problem solved? I confess that haven't had the
necessity to use the highlighter package yet, so I may be missing

But you're not really "double storing". You'll find that indexed code takes
MUCH less space than you would think, nowhere near the amount
required to store the data too. So there's good reason to separate the two.

You have no choice except to store the data if you want the user to see
something pretty.....


> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message