lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From uday kumar maddigatla <u...@mach.com>
Subject Re: How to search special characters in LUcene
Date Fri, 24 Apr 2009 07:53:06 GMT

Hi Thanks for your reply.

After gone threw with the site which you given... i understood that
StandardAnalyzer is enough to handle these special characters.

i'm attaching one class called AnalysisDemo.java. By executing that class
i'm able to say the above sentance(i.e StandardAnalyzer is enough).

Here is the out put when i ran the above java file.
Analzying "Vedr.           :   Amtsgården Århus, Lyseng Allé 1, 8270
Højbjerg"
	org.apache.lucene.analysis.WhitespaceAnalyzer:
		[Vedr.] [:] [Amtsgården] [Århus,] [Lyseng] [Allé] [1,] [8270] [Højbjerg] 

	org.apache.lucene.analysis.SimpleAnalyzer:
		[vedr] [amtsgården] [århus] [lyseng] [allé] [højbjerg] 

	org.apache.lucene.analysis.StopAnalyzer:
		[vedr] [amtsgården] [århus] [lyseng] [allé] [højbjerg] 

	org.apache.lucene.analysis.standard.StandardAnalyzer:
		[vedr] [amtsgården] [århus] [lyseng] [allé] [1] [8270] [højbjerg] 

	org.apache.lucene.analysis.snowball.SnowballAnalyzer:
		[vedr] [amtsgården] [århus] [lyseng] [allé] [1] [8270] [højbjerg] 

By the above out put we can say that StandardAnalyzer is enough to get rid
of danish elements.

But only problem is when i'm searching the any term which includes the
danish elements(like højbjerg...)

it is unable to find out. 

Even i checked with LUKE. In that  i given my sample text which contains the
danish elements and selected the StandardAnalyzer as analyser. when i click
analyze in that it cleary making index of danish words.

and also i givne one try on luke by loading my index directory in to luke.
after loading my index i searched for a word which contains the danish
element, But this time it was failed. It was shown nothing(i.e o resluts).

As in my sense the problem might be making the indexes or in searching the
item.


I gone threw the site which you given. From that i'm able to do this kind of
reaserch work.

Please help me in this.


Erick Erickson wrote:
> 
> OK, this is a much different problem than you were originally
> asking about, effectively "how to index/search mixed language
> documents".
> 
> This topic has been discussed multiple times on the user list, I
> think your first step should be to search the archive. I *was*
> going to find the old searchable mail archive, but those clever folks
> at Lucid Imagination have something new, see:
> 
> http://www.lucidimagination.com/search/p:lucene?q=multiple+languages
> 
> Once you've had a chance to look that over I think you'll be off and
> running.
> 
> Best
> Erick
> 
> On Thu, Apr 23, 2009 at 1:43 AM, uday kumar maddigatla
> <ukma@mach.com>wrote:
> 
>>
>> HI
>>
>> Here are the details about my goals.
>> 1. I want to use this lucene for mixed languages.
>> 2. I want to make indexes of the documents which are either english or
>> danish etc.
>> I'm attaching my IndexFiles.java file.
>>
>> When i'm searching i'm giving the index path location  as well as
>> doucmets
>> folder.
>>
>> If i use StandardAnalyzer as an argument to IndexWriter's method it is
>> able
>> to search the english characters.
>>
>> How can i use DutchAnalyzer in order to make this IndexFiles.java to
>> index
>> the danish elements.
>>
>> In my Code which i attached, you can see 'C:\test3'. This is my location
>> where i want to store my indexes.
>>
>> I'm giving documents folder location as comand line argument.
>>
>> In my document the content will be like this
>>
>> <com:Note><![CDATA[Kreditnota til udligning af faktura nr. 13927 pga skal
>> opsplittes
>> hhv. byggeplads og skat
>> Vedr.           :   Amtsgården Århus, Lyseng Allé 1, 8270 Højbjerg
>> Bygning B
>> SES Journal nr. :   42895-0001
>> SES Navision nr.:   Navision 9800124
>> SES Ansvarlig   :   Martin Krøldrup Nielsen
>> SES rådgiver    :   Friis & Moltke A/S
>> Hermed fremsendes faktura på ekstra tømrerarbejde.
>> Byggeplads Amtsgården B-4
>> jvf. vedlagte specifikation - aftaleseddel nr. 12.]]></com:Note>
>>
>> i"m searching the word like rådgiver . When i see the result it is
>> clearly
>> searching for r dgiver. It is omitting the danish element.
>>
>> Please help me in this.
>>
>>
>>
>> Erick Erickson wrote:
>> >
>> > Are you *also* using the DutchAnalyzer for your *query*?
>> >
>> > Please show us the index and search code (simplified as much
>> > as possible), then we'll be able to provide better suggestions.
>> >
>> > Also, tell us a bit more about your goals here. Is this an
>> > index entirely of Dutch documents? Or is it a mixed-language
>> > index?
>> >
>> > Think about getting a copy of Luke and
>> > 1> examining your index to see what's *really* there
>> > 2> examining the effects of using different parsers on
>> >      your *query*.
>> >
>> > Best
>> > Erick
>> >
>> > On Wed, Apr 22, 2009 at 2:57 AM, uday kumar maddigatla
>> > <ukma@mach.com>wrote:
>> >
>> >>
>> >> Hi
>> >>
>> >> Thanks for your reply.
>> >>
>> >> I'm able to see the DutchAnalyzer.
>> >>
>> >> When i'm indexing my documents i given instace of DutchAnalyzer as an
>> >> argument to IndexWriter Class.
>> >>
>> >> After this when i search for the
>> >> http://www.nabble.com/file/p23170710/IndexFiles.java IndexFiles.java
>> >> contains the danish elements .. Still it is not able to identify.
>> >>
>> >> Please tell me how to use DutchAnalzer in my application. Sample
>> example
>> >> or
>> >> series of steps helps me.
>> >>
>> >> I also attached my index file(.java file).
>> >>
>> >> Please help me in this. please..
>> >>
>> >> Erick Erickson wrote:
>> >> >
>> >> > Take a look at DutchAnalyzer. The problem you'll have is if you're
>> >> > indexing
>> >> > this document along with a bunch of documents from other languages.
>> >> > You could search the mail archive for extensive discussions of
>> >> indexing/
>> >> > searching documents from several languages.
>> >> >
>> >> > Best
>> >> > Erick
>> >> >
>> >> > On Tue, Apr 21, 2009 at 2:40 AM, Uday Kumar Maddigatla
>> >> > <ukma@mach.com>wrote:
>> >> >
>> >> >> HI,
>> >> >>
>> >> >>
>> >> >>
>> >> >> I'm new to the lucene. I downloaded lucene 2.4.1.
>> >> >>
>> >> >>
>> >> >>
>> >> >> I have one xml file which contains few special characters like
'å',
>> >> 'ø,'
>> >> >> °'
>> >> >> etc.(these are Danish language elements).
>> >> >>
>> >> >>
>> >> >>
>> >> >> How can I search these things.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> Uday Kumar  Reddy Maddigatla
>> >> >>
>> >> >> Software Engineer(Progrator|gatetrade)
>> >> >>
>> >> >> MACH India(Operations)
>> >> >>
>> >> >> Mobile: + 91-9963000377
>> >> >>
>> >> >> Uday.Maddigatla@ness.com <mailto:Uday.Maddigatla@ness.com>
>> >> >>
>> >> >> ukma@mach.com <mailto:ukma@mach.com>
>> >> >>
>> >> >> www.ness.com
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23170710.html
>> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>> >
>> http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java
>> http://www.nabble.com/file/p23190583/SearchFiles.java SearchFiles.java
>> http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java
>> http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23190583.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 
http://www.nabble.com/file/p23211629/AnalysisDemo.java AnalysisDemo.java 
-- 
View this message in context: http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23211629.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message