lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: How to search special characters in LUcene
Date Thu, 23 Apr 2009 13:34:55 GMT
OK, this is a much different problem than you were originally
asking about, effectively "how to index/search mixed language
documents".

This topic has been discussed multiple times on the user list, I
think your first step should be to search the archive. I *was*
going to find the old searchable mail archive, but those clever folks
at Lucid Imagination have something new, see:

http://www.lucidimagination.com/search/p:lucene?q=multiple+languages

Once you've had a chance to look that over I think you'll be off and
running.

Best
Erick

On Thu, Apr 23, 2009 at 1:43 AM, uday kumar maddigatla <ukma@mach.com>wrote:

>
> HI
>
> Here are the details about my goals.
> 1. I want to use this lucene for mixed languages.
> 2. I want to make indexes of the documents which are either english or
> danish etc.
> I'm attaching my IndexFiles.java file.
>
> When i'm searching i'm giving the index path location  as well as doucmets
> folder.
>
> If i use StandardAnalyzer as an argument to IndexWriter's method it is able
> to search the english characters.
>
> How can i use DutchAnalyzer in order to make this IndexFiles.java to index
> the danish elements.
>
> In my Code which i attached, you can see 'C:\test3'. This is my location
> where i want to store my indexes.
>
> I'm giving documents folder location as comand line argument.
>
> In my document the content will be like this
>
> <com:Note><![CDATA[Kreditnota til udligning af faktura nr. 13927 pga skal
> opsplittes
> hhv. byggeplads og skat
> Vedr.           :   Amtsgården Århus, Lyseng Allé 1, 8270 Højbjerg
> Bygning B
> SES Journal nr. :   42895-0001
> SES Navision nr.:   Navision 9800124
> SES Ansvarlig   :   Martin Krøldrup Nielsen
> SES rådgiver    :   Friis & Moltke A/S
> Hermed fremsendes faktura på ekstra tømrerarbejde.
> Byggeplads Amtsgården B-4
> jvf. vedlagte specifikation - aftaleseddel nr. 12.]]></com:Note>
>
> i"m searching the word like rådgiver . When i see the result it is clearly
> searching for r dgiver. It is omitting the danish element.
>
> Please help me in this.
>
>
>
> Erick Erickson wrote:
> >
> > Are you *also* using the DutchAnalyzer for your *query*?
> >
> > Please show us the index and search code (simplified as much
> > as possible), then we'll be able to provide better suggestions.
> >
> > Also, tell us a bit more about your goals here. Is this an
> > index entirely of Dutch documents? Or is it a mixed-language
> > index?
> >
> > Think about getting a copy of Luke and
> > 1> examining your index to see what's *really* there
> > 2> examining the effects of using different parsers on
> >      your *query*.
> >
> > Best
> > Erick
> >
> > On Wed, Apr 22, 2009 at 2:57 AM, uday kumar maddigatla
> > <ukma@mach.com>wrote:
> >
> >>
> >> Hi
> >>
> >> Thanks for your reply.
> >>
> >> I'm able to see the DutchAnalyzer.
> >>
> >> When i'm indexing my documents i given instace of DutchAnalyzer as an
> >> argument to IndexWriter Class.
> >>
> >> After this when i search for the
> >> http://www.nabble.com/file/p23170710/IndexFiles.java IndexFiles.java
> >> contains the danish elements .. Still it is not able to identify.
> >>
> >> Please tell me how to use DutchAnalzer in my application. Sample example
> >> or
> >> series of steps helps me.
> >>
> >> I also attached my index file(.java file).
> >>
> >> Please help me in this. please..
> >>
> >> Erick Erickson wrote:
> >> >
> >> > Take a look at DutchAnalyzer. The problem you'll have is if you're
> >> > indexing
> >> > this document along with a bunch of documents from other languages.
> >> > You could search the mail archive for extensive discussions of
> >> indexing/
> >> > searching documents from several languages.
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On Tue, Apr 21, 2009 at 2:40 AM, Uday Kumar Maddigatla
> >> > <ukma@mach.com>wrote:
> >> >
> >> >> HI,
> >> >>
> >> >>
> >> >>
> >> >> I'm new to the lucene. I downloaded lucene 2.4.1.
> >> >>
> >> >>
> >> >>
> >> >> I have one xml file which contains few special characters like 'å',
> >> 'ø,'
> >> >> °'
> >> >> etc.(these are Danish language elements).
> >> >>
> >> >>
> >> >>
> >> >> How can I search these things.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Uday Kumar  Reddy Maddigatla
> >> >>
> >> >> Software Engineer(Progrator|gatetrade)
> >> >>
> >> >> MACH India(Operations)
> >> >>
> >> >> Mobile: + 91-9963000377
> >> >>
> >> >> Uday.Maddigatla@ness.com <mailto:Uday.Maddigatla@ness.com>
> >> >>
> >> >> ukma@mach.com <mailto:ukma@mach.com>
> >> >>
> >> >> www.ness.com
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23170710.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
> http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java
> http://www.nabble.com/file/p23190583/SearchFiles.java SearchFiles.java
> http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java
> http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java
> --
> View this message in context:
> http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23190583.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message