Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Date: Thu, 1 Apr 2010 04:19:07 -0800 (PST)
From: henrib <henrib@apache.org>
To: java-user@lucene.apache.org
Message-ID: <1270124347984-690625.post@n3.nabble.com>
In-Reply-To: <817123.44413.qm@web24101.mail.ird.yahoo.com>
References: <817123.44413.qm@web24101.mail.ird.yahoo.com>
Subject: Re: Designing a multilingual index
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi,
I worked some time ago on a similar system (using Solr) and used the
multiple indices route (the multicore feature in Solr). In our case, the
"same" document could exist in different languages; different localized
versions of the same information (same Solr unique id for each l10n
version). 

This allowed to have the same index structure across locales but different
settings for each (synonyms, stemmers, etc). Maintenance was easier this
way; when refining/updating the settings (say adding synonyms or stemmers
for instance), you may need to reindex and smaller indices allow faster
deployments. It's also "dead-easy" to add a new language (esp. compared to
the one index solution). It also makes replication or partitioning easier.
Overall, IMO, this is a more scalable architecture than the single-index
one.

Users were able to set in which language they were "fluent" (default being
browser locale) so queries would only be performed in those and results
"clustered" per locale (no need to return results that can not be
understood...). Besides, IMO, scoring / ordering documents in different
languages is a bit like comparing apples and oranges.

Finally, query expansion can also be used in the multiple indices case and
might even use automated/guided translation.

In my experience, multiple indices had many advantages over the single index
solution, be them functional or operational. YMMV.
Hope this helps,
Henrib
 
-- 
View this message in context: http://n3.nabble.com/Designing-a-multilingual-index-tp688766p690625.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org