Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 5482 invoked from network); 19 Oct 2008 19:59:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Oct 2008 19:59:35 -0000 Received: (qmail 89584 invoked by uid 500); 19 Oct 2008 19:59:35 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 89546 invoked by uid 500); 19 Oct 2008 19:59:35 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 89535 invoked by uid 99); 19 Oct 2008 19:59:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Oct 2008 12:59:34 -0700 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Oct 2008 19:58:23 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1KreQX-00075d-0V for solr-user@lucene.apache.org; Sun, 19 Oct 2008 12:59:01 -0700 Message-ID: <20059666.post@talk.nabble.com> Date: Sun, 19 Oct 2008 12:59:01 -0700 (PDT) From: sunnyfr To: solr-user@lucene.apache.org Subject: Re: Multi-language solr1.3 what would you reckon? In-Reply-To: <48F35976.4020107@corp.aol.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Nabble-From: johanna.34@gmail.com References: <19954805.post@talk.nabble.com> <48F34CE2.7000906@corp.aol.com> <19955092.post@talk.nabble.com> <48F35008.5010905@corp.aol.com> <19955411.post@talk.nabble.com> <48F35976.4020107@corp.aol.com> X-Virus-Checked: Checked by ClamAV on apache.org Hi, Just a question, I thought if I write without defining any type like index or query, it would apply it for both, isn't it ? thanks, John E. McBride wrote: >=20 > In your schema you define each field as follows: >=20 > > =E2=88=92 > > > > > > > > >=20 > etc >=20 > However, you have not defined the query filters - if you do not this=20 > then you will not get any matches for searches in different languages. >=20 > for example, in english if you index the sentence "the joyful boy played= =20 > tennis", this would typically get stored as "joy boy play tennis" due to= =20 > the analysis filters. If you then made a query for "joyful" without=20 > applying the same filters on the query side you would get no matches. >=20 > You will also want to get some multilingual stop words lists from=20 > snowball website eg > http://snowball.tartarus.org/algorithms/german/stop.txt. >=20 > sunnyfr wrote: >> What is the problem with the way that I've done,=20 >> Does that's means that there is some which are linked with language that >> we >> won't manage by search, >> there is too many language, the application will be for video, >> we will manage around 10 language, but in our database we have around 2= 5 >> language,=20 >> Should i create a core text and others like text_en, text_fr, text_es, >> and >> all the video which are not in this language manage by the search engine >> should be stored in text ? >> >> Because even if they are on the english website they should be able if >> they >> enter a french word "chien" for "dog" >> to find french videos. >> I don't know if I'm clear?? >> >> and even so text should manage all the other language which are not >> managed >> in the other cores ??=20 >> >> thanks >> >> >> John E. McBride wrote: >> =20 >>> Well, it's this section shown below, which would change from geography= =20 >>> to geography. >>> Parameterise the EnglishPorterFilterFactory and protwords. >>> >>> You could introduce logic in the front end which asks if num results is= =20 >>> zero then makes a call to the english language, but it doesn't make=20 >>> logical sense? why would a search in the italian language bring up=20 >>> anything in the english index? >>> >>> I think you need to explain your application in a little more detail. >>> >>> >>> >> positionIncrementGap=3D"100"> >>> - >>> >>> >>> - >>> >>> - >>> >>> >> words=3D"stopwords.txt" enablePositionIncrements=3D"true"/> >>> >> generateNumberParts=3D"1" catenateWords=3D"1" catenateNumbers=3D"1"=20 >>> catenateAll=3D"0" splitOnCaseChange=3D"1"/> >>> >>> >> protected=3D"protwords.txt"/> >>> >>> >>> - >>> >>> >>> >> ignoreCase=3D"true" expand=3D"true"/> >>> >> words=3D"stopwords.txt"/> >>> >> generateNumberParts=3D"1" catenateWords=3D"0" catenateNumbers=3D"0"=20 >>> catenateAll=3D"0" splitOnCaseChange=3D"1"/> >>> >>> >> protected=3D"protwords.txt"/> >>> >>> >>> >>> >>> sunnyfr wrote: >>> =20 >>>> Hi, >>>> >>>> Thanks guys for your answer, but I don't think I can use multi-core fo= r >>>> each >>>> language,=20 >>>> because for exemple if somebody is connected from Italia and if there >>>> is >>>> not >>>> that much Italian's book, >>>> so by default I will show up few italian books but all the english one >>>> as >>>> well. >>>> >>>> Do you have an example ?=20 >>>> I'm quite lost about it, >>>> >>>> >>>> John E. McBride wrote: >>>> =20 >>>> =20 >>>>> Fairly nebulous requirements, but I recently was involved in a=20 >>>>> multilingual search platform. >>>>> >>>>> The approach, translated to solr 1.3 would be to use multicore - one= =20 >>>>> core per geography. Then a schema.xml per core, each with a differen= t=20 >>>>> language in the porter algorithm, stopwords etc - taken from snowball= . >>>>> >>>>> Then on the german front end you make requests to the de core, on the= =20 >>>>> english front end make requests to the english core. >>>>> >>>>> This is much simpler than sorting every language in the one index, fo= r=20 >>>>> example german queries will need to be run through the german query= =20 >>>>> filters etc. If you have all languages in one schema, then you will= =20 >>>>> have to do some front end logic to map the query to the correct field= . >>>>> >>>>> You have failed to consider internationalisation of the query side of= =20 >>>>> the process - your field type merely have analysis filters.=20 >>>>> >>>>> Additionally, if the data source for each different geography is=20 >>>>> different it makes sense to separate the indexes and subsequently the= =20 >>>>> ingestion mechanisms and schedules. >>>>> >>>>> Just a few thoughts. >>>>> >>>>> John >>>>> >>>>> sunnyfr wrote: >>>>> =20 >>>>> =20 >>>>>> Hi, >>>>>> >>>>>> I would like to manage properly multi language search motor, >>>>>> I would like your advice about what have I done. >>>>>> >>>>>> Solr1.3 >>>>>> tomcat55 >>>>>> >>>>>> http://www.nabble.com/file/p19954805/schema.xml schema.xml=20 >>>>>> >>>>>> Thanks a lot, >>>>>> >>>>>> =20 >>>>>> =20 >>>>>> =20 >>>>> =20 >>>>> =20 >>>> =20 >>>> =20 >>> >>> =20 >> >> =20 >=20 >=20 >=20 --=20 View this message in context: http://www.nabble.com/Multi-language-solr1.3-= what-would-you-reckon--tp19954805p20059666.html Sent from the Solr - User mailing list archive at Nabble.com.