lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "叶双明" <yeshuangm...@gmail.com>
Subject Re: Lucene search fails for japanese characters in URL
Date Thu, 18 Sep 2008 07:36:45 GMT
And, you can use Tool luke to see what is in the index indeed.
what is in the Query which put into IndexSearcher.search(), what is the
defaultOperatoer of QueryParser.

Can you get hits by setup a simple IndexSearcher, no through  tomcat?

2008/9/18 anandsarwade <anand.sarwade@corp.aol.com>

>
> Hi,
>
> I do get the same string from Mysql and also in servlet request. I could
> observe the actaul string in eclipse while debugging. it is stored as UTF-8
> format so retrievel is coming as stored.
>
> plz let me know if iam not clear
>
>
> 叶双明 wrote:
> >
> > You must trace the string  in each step!
> > Important step is get string from MYSQL and get parameter in servlet,
> > please
> > check it, do you get the right string?
> > Chinese has the same problem too.
> >
> > 2008/9/17 anandsarwade <anand.sarwade@corp.aol.com>
> >
> >>
> >> Hello Jimi,
> >>
> >> Thanks a lot for your valuable suggestion.
> >>
> >> I am using tomcat 5 . As per your suggestions ,checked the server.xml
> but
> >> found that no URIEncoding was set.
> >> I have set now and to my great relief :-) i could see the Lucene results
> >> on
> >> my browser for japanese string with request objects in UTF-8 now.
> >>
> >> Thanks again for your help.
> >>
> >> Regards,
> >> Anand.
> >>
> >>
> >> JimiH wrote:
> >> >
> >> > What webserver are you using? For example, with Tomcat, it could be
> >> > because of the setting URIEncoding in server.xml.
> >> >
> >> > http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
> >> >
> >> > /Jimi
> >> >
> >> > mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113 30
> >> > stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 |
> >> > jimi.hullegard@mogul.com | www.mogul.com
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: anandsarwade [mailto:anand.sarwade@corp.aol.com]
> >> >> Sent: den 17 september 2008 16:42
> >> >> To: java-user@lucene.apache.org
> >> >> Subject: Lucene search fails for japanese characters in URL
> >> >>
> >> >>
> >> >> Hi ,
> >> >>
> >> >> I am facing below problem. Please help me in this.
> >> >>
> >> >> I have integrated CJK Analyzer for Japanese characters. I am
> >> >> able to save
> >> >> japanese double byte characters in mysql database in UTF-8
> >> >> format without
> >> >> issues. I could that data is getted indexed. Now when i
> >> >> search the Japanese
> >> >> characters which were indexed using the URL below , returns
> >> >> empty results.
> >> >>
> >> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
> >> >> x=100&cap=言語
> >> >>
> >> >> Noticed that the above url gets converted to the following
> >> >> URL having some
> >> >> HTML encoded strings in search.
> >> >>
> >> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
> >> >> x=100&cap=%E8%A8%80%E8%AA%9E
> >> >>
> >> >> This does not match with the existing lucene indexes
> >> >> henceforth returns
> >> >> empty results.  How do i solve this lucene search issue
> >> >> having japanese
> >> >> words in URLs.? Is there any way to convert such characters
> >> >> back to Japanese
> >> >> words???
> >> >>
> >> >> Any help/suggestions in this regards is highly appreciated.
> >> >>
> >> >> Thanks in Advance.
> >> >>
> >> >> Regards,
> >> >> Anand
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >> http://www.nabble.com/Lucene-search-fails-for-japanese-charact
> >> >> ers-in-URL-tp19533647p19533647.html
> >> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19534342.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Sorry for my english!! 明
> > Please help me to correct my english expression and error in syntax
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19547081.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Sorry for my english!! 明
Please help me to correct my english expression and error in syntax
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message