lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cesar Ronchese <ronch...@hotmail.com>
Subject Re: Indexing accented characters, then searching by any form
Date Tue, 12 Feb 2008 00:13:46 GMT

Well, it is done now.

As final result, I surrended myself to "double-storing". This way, I have
indexed the original text with COMPRESSED option to save some space.

And to highlight the results correctly, I made some matching between
unaccented-words and original words by regular expressions, and the results
is satisfactory.

Thanks all for the brainstorming ^^
Cesar





Erick Erickson wrote:
> 
> See below...
> 
> 
> On Feb 11, 2008 12:17 PM, Cesar Ronchese <ronchese@hotmail.com> wrote:
> 
>>
>> Hey, Erick. You inferred right.
>>
>> I analized your code and it looks like a common Indexing and Searching
>> code.
>> Are you sure you pasted the correct code? :P
>>
> 
> Did you try to run it? It's just a self-contained example showing that
> searching
> and displaying are distinct.
> 
> The indexer part indexes a mixed-case string. The search is then
> performed on a lower-case string, and the println shows that a
> document was found. The next println echoes back the stored text
> showing that the original was stored. Just substitute your preferred
> filter to see how this would work for you.
> 
> 
> 
>>
>> Anyways, is the concept about doubling storing data, one content with
>> accents and other without? If yes, I did it earlier, but once I search in
>> the non-accent content and show accent content, the HitHighlighter will
>> now
>> work properly.
>> --
>>
> 
> Is this a typo or is your problem solved? I confess that haven't had the
> necessity to use the highlighter package yet, so I may be missing
> something...
> 
> But you're not really "double storing". You'll find that indexed code
> takes
> MUCH less space than you would think, nowhere near the amount
> required to store the data too. So there's good reason to separate the
> two.
> 
> You have no choice except to store the data if you want the user to see
> something pretty.....
> 
> Erick
> 
> 
>>
>> View this message in context:
>> http://www.nabble.com/Indexing-accented-characters%2C-then-searching-by-any-form-tp15412778p15415770.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Indexing-accented-characters%2C-then-searching-by-any-form-tp15412778p15423851.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message