lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raf <r.ventag...@gmail.com>
Subject What's the best way to translate a query in multiple languages?
Date Tue, 01 Nov 2011 17:07:45 GMT
Hi,
I have a Lucene index containing documents written in different languages.

Each document is written only in one language and I have a *language* field
containing the corresponding language identifier (it, en, fr, ...).
The *content* is saved in different fields for each language (e.g.
contents_it, contents_en, ...) and I use a specific language analyzer for
each of these field.

When the user inputs a query it selects also the language he is using to
write the query so I can create a *QueryParser* choosing the right *
defaultField* and* analyzer.*
*
*
This works fine, but, using this approach, users can find only documents
written in the same language used to write the query.

Now, I would like to *translate* user query in order to find also documents
written in different languages (that match the same query).

For example:
* *user_query =*   cane         *query_language* = it
* In this moment, using standard *QueryParser* I obtain this query   -->   *
contents_it:cane*
* In the new scenario, I would like to have this query   -->
(*contents_it:cane
contents_en:dog contents_fr:chien*)

but also

* *user_query* =  +"operating system" -linux      *query_language* = en
* I would like to have this query   -->  *+(contents_en:"operating system" *
*contents_it:"sistema operativo"**) -(contents_en:linux **contents_it:linux*
*)*
*
*
Suppose that:
* for each index/application I have a fixed number of available languages,
each with its *defaultField* and specific *analyzer.*
* I already have a service that is able to translate words and/or small
phrases between languages I am interested in.


I was thinking about extending *QueryParser* overriding some methods to add
my custom behaviour.

This looks quite easy for TermQuery, for example doing something like this:

protected Query newTermQuery(Term term){

    BooleanQuery bq = new BooleanQuery();
    bq.add(new BooleanClause(new TermQuery(term),
BooleanClause.Occur.SHOULD));

    *for each language except queryLanguage *{
         TermQuery translatedTQ = translateTerm(term, queryLanguage,
language);
         bq.add(new BooleanClause(translatedTQ,
BooleanClause.Occur.SHOULD));
*    *}

    return bq;
  }

But it looks quite more difficult for other query types (without *rewriting
QueryParser* instead of extending it).
Am I missing something? Is there a better approach to achieve the same goal?

I am using *lucene 3.0.3* and, for now, I cannot upgrade to more recent
versions.

Thanks in advance,
Bye.

*Raf*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message