lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stanislaw Osinski (Created) (JIRA)" <>
Subject [jira] [Created] (SOLR-2939) Clustering of multilingual search results
Date Fri, 02 Dec 2011 13:25:40 GMT
Clustering of multilingual search results

                 Key: SOLR-2939
             Project: Solr
          Issue Type: Improvement
          Components: contrib - Clustering
            Reporter: Stanislaw Osinski
            Assignee: Stanislaw Osinski
             Fix For: 3.6

Carrot2 internally supports clustering of multilingual search results. The clustering component
should allow passing a language field to Carrot2. This feature would need at least two new
parameters: {{carrot.lang}} for the name of Solr field that contains the language code (ISO
639) and a {{carrot.lcmap}} field similar to the one in language recognizer to map arbitrary
strings to ISO 639 codes.

Another feature of language recognizer we should mirror is the expansion of the {{{lang}}}
token in field names into the language code of the document (in case of multiple languages
per document -- the first Carrot2-supported language code). The feature seems easy to implement
in the non-distributed setting of Solr, but the simple implementation isn't going to work
in the distributed setting because the name of the specific field to be fetched depends on
the content (language) of each matching document. Looking at the {{SearchClusteringEngine.getFieldsToLoad(SolrQueryRequest)}}
method, a quick but costly solution would be to load the contents of all stored fields. I'm
not too strong in distributed-mode Solr, but maybe this could be optimized so that only the
required fields get fetched?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message