lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bai Shen <baishen.li...@gmail.com>
Subject Re: Language Identification
Date Mon, 23 Apr 2012 17:27:04 GMT
I was under the impression that solr does Tika and the language identifier
that Shuyo did.  The page at
http://wiki.apache.org/solr/LanguageDetectionlists them both.

<processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
<processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">

Again, I'm just trying to understand why it was moved to solr.


On Fri, Apr 20, 2012 at 6:02 PM, Jan Høydahl <jan.asf@cominvent.com> wrote:

> Hi,
>
> Solr just reuses Tika's language identifier. But you are of course free to
> do your language detection on the Nutch side if you choose and not invoke
> the one in Solr.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 20. apr. 2012, at 21:49, Bai Shen wrote:
>
> > I'm working on using Shuyo's work to improve the language identification
> of
> > our search.  Apparently, it's been moved from Nutch to Solr.  Is there a
> > reason for this?
> >
> > http://code.google.com/p/language-detection/issues/detail?id=34
> >
> > I would prefer to have the processing done in Nutch as that has the
> benefit
> > of more hardware and not interfering with Solr latency.
> >
> > Thanks.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message