lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: Solr does not recognize language
Date Mon, 05 May 2014 11:08:00 GMT
Hi Victor,

How do you index your documents? Your last config looks correct. However for example if you
use data import handler you need to add update.chain there too. Same as extraction request
hadler if you are using sole-cell.

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">/home/username/data-config.xml</str>
      <str name="update.chain">langid</str>
    </lst>
  </requestHandler>

By the way The URL http://localhost:8080/solr/update?commit=true&update.chain=langid was
just an example and meant to feed xml update messages by POST method. Not to use in a browser.

Ahmet

On Monday, May 5, 2014 11:04 AM, Victor Pascual <victor@mobilemediacontent.com> wrote:

Thank you very much for you help Ahmet.

However the language detection is still not workin. :(
My solrconfig.xml didn't contain that lst section inside the update requestHandler.
That's the content I added:

  <requestHandler name="/update"
>                  class="solr.XmlUpdateRequestHandler">
>       <lst name="defaults">
>         <str name="update.chain">langid</str>
>       </lst>
>    </requestHandler>
>

   <updateRequestProcessorChain name="langid">
>       <processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>          <lst name="defaults">
>            <str name="langid.fl">text</str>
>            <str name="langid.langField">lang</str>
>          </lst>
>        </processor>
>        <processor class="solr.LogUpdateProcessorFactory" />
>       <processor class="solr.RunUpdateProcessorFactory" />
>     </updateRequestProcessorChain>

Now, your suggested query http://localhost:8080/solr/update?commit=true&update.chain=langid
returns

<response>
><lst name="responseHeader">
><int name="status">0</int>
><int name="QTime">14</int>
></lst>
></response>
And there is still no lang field in my documents.
Any idea what am I doing wrong?




On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan <iorixxx@yahoo.com> wrote:

Hi,
>
>solr/update should be used, not /solr/select
>
>curl 'http://localhost:8983/solr/update?commit=true&update.chain=langid' 
>
>By the way don't you have following definition in your solrconfig.xml?
>
> <requestHandler name="/update" class="solr.UpdateRequestHandler">  
>
>       <lst name="defaults">
>         <str name="update.chain">langid</str>
>       </lst>      
>  </requestHandler>
>
>
>
>
>On Tuesday, April 29, 2014 4:50 PM, Victor Pascual <victor@mobilemediacontent.com>
wrote:
>Hi Ahmet,
>
>thanks for your reply. Adding &update.chain=langid to my query doesn't
>work: IP:8080/solr/select/?q=*%3A*&update.chain=langid
>Regarding defining the chain in an UpdateRequestHandler... sorry for the
>lame question but shall I paste those three lines to solrconfig.xml, or
>shall I add them somewhere else?
>
>There is not UpdateRequestHandler in my solrconfig.
>
>Thanks!
>
>
>
>On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>
>> Hi,
>>
>> Did you attach your chain to a UpdateRequestHandler?
>>
>> You can do it by adding &update.chain=langid to the URL or defining it in
>> a defaults section as follows
>>
>> <lst name="defaults">
>>      <str name="update.chain">langid</str>
>>    </lst>
>>
>>
>>
>> On Tuesday, April 29, 2014 3:18 PM, Victor Pascual <
>> victor@mobilemediacontent.com> wrote:
>> Dear all,
>>
>> I'm a new user of Solr. I've managed to index a bunch of documents (in
>> fact, they are tweets) and everything works quite smoothly.
>>
>> Nevertheless it looks like Solr doesn't detect the language of my documents
>> nor remove stopwords accordingly so I can extract the most frequent terms.
>>
>> I've added this piece of XML to my solrconfig.xml as well as the Tika lib
>> jars.
>>
>>     <updateRequestProcessorChain name="langid">
>>        <processor
>>
>> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>>           <lst name="defaults">
>>             <str name="langid.fl">text</str>
>>             <str name="langid.langField">lang</str>
>>           </lst>
>>         </processor>
>>         <processor class="solr.LogUpdateProcessorFactory" />
>>        <processor class="solr.RunUpdateProcessorFactory" />
>>      </updateRequestProcessorChain>
>>
>> There is no error in the tomcat log file, so I have no clue of why this
>> isn't working.
>> Any hint on how to solve this problem will be much appreciated!
>>
>
>

Mime
View raw message