lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: Language Identification
Date Mon, 23 Apr 2012 21:39:03 GMT
I think nothing has "moved". We just offer Solr users to do language detection inside of Solr,
using any of these two libs. If you choose to do language detection on client side instead,
using any of these, what is stopping you?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 23. apr. 2012, at 19:27, Bai Shen wrote:

> I was under the impression that solr does Tika and the language identifier
> that Shuyo did.  The page at
> http://wiki.apache.org/solr/LanguageDetectionlists them both.
> 
> <processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
> <processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
> 
> Again, I'm just trying to understand why it was moved to solr.
> 
> 
> On Fri, Apr 20, 2012 at 6:02 PM, Jan Høydahl <jan.asf@cominvent.com> wrote:
> 
>> Hi,
>> 
>> Solr just reuses Tika's language identifier. But you are of course free to
>> do your language detection on the Nutch side if you choose and not invoke
>> the one in Solr.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> On 20. apr. 2012, at 21:49, Bai Shen wrote:
>> 
>>> I'm working on using Shuyo's work to improve the language identification
>> of
>>> our search.  Apparently, it's been moved from Nutch to Solr.  Is there a
>>> reason for this?
>>> 
>>> http://code.google.com/p/language-detection/issues/detail?id=34
>>> 
>>> I would prefer to have the processing done in Nutch as that has the
>> benefit
>>> of more hardware and not interfering with Solr latency.
>>> 
>>> Thanks.
>> 
>> 


Mime
View raw message