lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From uday kumar maddigatla <u...@mach.com>
Subject Re: How to search special characters in LUcene
Date Thu, 23 Apr 2009 05:43:30 GMT

HI 

Here are the details about my goals.
1. I want to use this lucene for mixed languages.
2. I want to make indexes of the documents which are either english or
danish etc.
I'm attaching my IndexFiles.java file.

When i'm searching i'm giving the index path location  as well as doucmets
folder.

If i use StandardAnalyzer as an argument to IndexWriter's method it is able
to search the english characters.

How can i use DutchAnalyzer in order to make this IndexFiles.java to index
the danish elements.

In my Code which i attached, you can see 'C:\test3'. This is my location
where i want to store my indexes.

I'm giving documents folder location as comand line argument.

In my document the content will be like this

<com:Note><![CDATA[Kreditnota til udligning af faktura nr. 13927 pga skal
opsplittes
hhv. byggeplads og skat
Vedr.           :   Amtsgården Århus, Lyseng Allé 1, 8270 Højbjerg
Bygning B
SES Journal nr. :   42895-0001
SES Navision nr.:   Navision 9800124
SES Ansvarlig   :   Martin Krøldrup Nielsen
SES rådgiver    :   Friis & Moltke A/S
Hermed fremsendes faktura på ekstra tømrerarbejde.
Byggeplads Amtsgården B-4
jvf. vedlagte specifikation - aftaleseddel nr. 12.]]></com:Note>

i"m searching the word like rådgiver . When i see the result it is clearly
searching for r dgiver. It is omitting the danish element.

Please help me in this.



Erick Erickson wrote:
> 
> Are you *also* using the DutchAnalyzer for your *query*?
> 
> Please show us the index and search code (simplified as much
> as possible), then we'll be able to provide better suggestions.
> 
> Also, tell us a bit more about your goals here. Is this an
> index entirely of Dutch documents? Or is it a mixed-language
> index?
> 
> Think about getting a copy of Luke and
> 1> examining your index to see what's *really* there
> 2> examining the effects of using different parsers on
>      your *query*.
> 
> Best
> Erick
> 
> On Wed, Apr 22, 2009 at 2:57 AM, uday kumar maddigatla
> <ukma@mach.com>wrote:
> 
>>
>> Hi
>>
>> Thanks for your reply.
>>
>> I'm able to see the DutchAnalyzer.
>>
>> When i'm indexing my documents i given instace of DutchAnalyzer as an
>> argument to IndexWriter Class.
>>
>> After this when i search for the
>> http://www.nabble.com/file/p23170710/IndexFiles.java IndexFiles.java
>> contains the danish elements .. Still it is not able to identify.
>>
>> Please tell me how to use DutchAnalzer in my application. Sample example
>> or
>> series of steps helps me.
>>
>> I also attached my index file(.java file).
>>
>> Please help me in this. please..
>>
>> Erick Erickson wrote:
>> >
>> > Take a look at DutchAnalyzer. The problem you'll have is if you're
>> > indexing
>> > this document along with a bunch of documents from other languages.
>> > You could search the mail archive for extensive discussions of 
>> indexing/
>> > searching documents from several languages.
>> >
>> > Best
>> > Erick
>> >
>> > On Tue, Apr 21, 2009 at 2:40 AM, Uday Kumar Maddigatla
>> > <ukma@mach.com>wrote:
>> >
>> >> HI,
>> >>
>> >>
>> >>
>> >> I'm new to the lucene. I downloaded lucene 2.4.1.
>> >>
>> >>
>> >>
>> >> I have one xml file which contains few special characters like 'å',
>> 'ø,'
>> >> °'
>> >> etc.(these are Danish language elements).
>> >>
>> >>
>> >>
>> >> How can I search these things.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Uday Kumar  Reddy Maddigatla
>> >>
>> >> Software Engineer(Progrator|gatetrade)
>> >>
>> >> MACH India(Operations)
>> >>
>> >> Mobile: + 91-9963000377
>> >>
>> >> Uday.Maddigatla@ness.com <mailto:Uday.Maddigatla@ness.com>
>> >>
>> >> ukma@mach.com <mailto:ukma@mach.com>
>> >>
>> >> www.ness.com
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23170710.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 
http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java 
http://www.nabble.com/file/p23190583/SearchFiles.java SearchFiles.java 
http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java 
http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java 
-- 
View this message in context: http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23190583.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message