lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanislaw Osinski <stanis...@osinski.name>
Subject Re: using Carrot2 custom ITokenizerFactory
Date Mon, 21 May 2012 09:11:07 GMT
Hi Koji,

Dawid came up with a simple fix for this, it's committed to trunk and 3.6
branch.

Staszek

On Sun, May 20, 2012 at 5:15 PM, Koji Sekiguchi <koji@r.email.ne.jp> wrote:

> Hi Staszek,
>
> Thank you for the fix so quickly!
>
> As a trial, I set:
>
> <str name="PreprocessingPipeline.**tokenizerFactory">org.apache.**
> solr.handler.clustering.**carrot2.**LuceneCarrot2TokenizerFactory<**/str>
>
> then I could start Solr without error. But when I make a request:
>
> http://localhost:8983/solr/**clustering?q=*%3A*&version=2.**
> 2&start=0&rows=10&indent=on&**wt=json&fl=id&carrot.**produceSummary=false<http://localhost:8983/solr/clustering?q=*%3A*&version=2.2&start=0&rows=10&indent=on&wt=json&fl=id&carrot.produceSummary=false>
>
> I got an exception:
>
> org.apache.solr.common.**SolrException: Carrot2 clustering failed
>        at org.apache.solr.handler.**clustering.carrot2.**
> CarrotClusteringEngine.**cluster(**CarrotClusteringEngine.java:**224)
>        at org.apache.solr.handler.**clustering.**
> ClusteringComponent.process(**ClusteringComponent.java:91)
>        at org.apache.solr.handler.**component.SearchHandler.**
> handleRequestBody(**SearchHandler.java:186)
>        at org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
> RequestHandlerBase.java:129)
>        at org.apache.solr.core.**RequestHandlers$**
> LazyRequestHandlerWrapper.**handleRequest(RequestHandlers.**java:244)
>        at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1376)
>        at org.apache.solr.servlet.**SolrDispatchFilter.execute(**
> SolrDispatchFilter.java:365)
>        at org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
> SolrDispatchFilter.java:260)
>        at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> doFilter(ServletHandler.java:**1212)
>        at org.mortbay.jetty.servlet.**ServletHandler.handle(**
> ServletHandler.java:399)
>        at org.mortbay.jetty.security.**SecurityHandler.handle(**
> SecurityHandler.java:216)
>        at org.mortbay.jetty.servlet.**SessionHandler.handle(**
> SessionHandler.java:182)
>        at org.mortbay.jetty.handler.**ContextHandler.handle(**
> ContextHandler.java:766)
>        at org.mortbay.jetty.webapp.**WebAppContext.handle(**
> WebAppContext.java:450)
>        at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(**
> ContextHandlerCollection.java:**230)
>        at org.mortbay.jetty.handler.**HandlerCollection.handle(**
> HandlerCollection.java:114)
>        at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
> HandlerWrapper.java:152)
>        at org.mortbay.jetty.Server.**handle(Server.java:326)
>        at org.mortbay.jetty.**HttpConnection.handleRequest(**
> HttpConnection.java:542)
>        at org.mortbay.jetty.**HttpConnection$RequestHandler.**
> headerComplete(HttpConnection.**java:928)
>        at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
>        at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
> java:212)
>        at org.mortbay.jetty.**HttpConnection.handle(**
> HttpConnection.java:404)
>        at org.mortbay.jetty.bio.**SocketConnector$Connection.**
> run(SocketConnector.java:228)
>        at org.mortbay.thread.**QueuedThreadPool$PoolThread.**
> run(QueuedThreadPool.java:582)
> Caused by: org.carrot2.core.**ComponentInitializationExcepti**on:
> org.carrot2.util.attribute.**AttributeBindingException: Could not assign
> field org.carrot2.text.**preprocessing.pipeline.**
> CompletePreprocessingPipeline#**tokenizerFactory with value
> org.apache.solr.handler.**clustering.carrot2.**
> LuceneCarrot2TokenizerFactory
>        at sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
> Method)
>        at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
> NativeConstructorAccessorImpl.**java:39)
>        at sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(**
> DelegatingConstructorAccessorI**mpl.java:27)
>        at java.lang.reflect.Constructor.**newInstance(Constructor.java:**
> 513)
>        at org.carrot2.util.**ExceptionUtils.wrapAs(**
> ExceptionUtils.java:63)
>        at org.carrot2.core.**PoolingProcessingComponentMana**ger$**
> ComponentInstantiationListener**.objectInstantiated(**
> PoolingProcessingComponentMana**ger.java:234)
>        at org.carrot2.core.**PoolingProcessingComponentMana**ger$**
> ComponentInstantiationListener**.objectInstantiated(**
> PoolingProcessingComponentMana**ger.java:169)
>        at org.carrot2.util.pool.**SoftUnboundedPool.**borrowObject(**
> SoftUnboundedPool.java:83)
>        at org.carrot2.core.**PoolingProcessingComponentMana**ger.prepare(*
> *PoolingProcessingComponentMana**ger.java:128)
>        at org.carrot2.core.Controller.**process(Controller.java:333)
>        at org.carrot2.core.Controller.**process(Controller.java:240)
>        at org.apache.solr.handler.**clustering.carrot2.**
> CarrotClusteringEngine.**cluster(**CarrotClusteringEngine.java:**220)
>        ... 24 more
> Caused by: org.carrot2.util.attribute.**AttributeBindingException: Could
> not assign field org.carrot2.text.**preprocessing.pipeline.**
> CompletePreprocessingPipeline#**tokenizerFactory with value
> org.apache.solr.handler.**clustering.carrot2.**
> LuceneCarrot2TokenizerFactory
>        at org.carrot2.util.attribute.**AttributeBinder$**
> AttributeBinderActionBind.**performAction(AttributeBinder.**java:614)
>        at org.carrot2.util.attribute.**AttributeBinder.bind(**
> AttributeBinder.java:311)
>        at org.carrot2.util.attribute.**AttributeBinder.bind(**
> AttributeBinder.java:349)
>        at org.carrot2.util.attribute.**AttributeBinder.bind(**
> AttributeBinder.java:219)
>        at org.carrot2.util.attribute.**AttributeBinder.set(**
> AttributeBinder.java:149)
>        at org.carrot2.util.attribute.**AttributeBinder.set(**
> AttributeBinder.java:129)
>        at org.carrot2.core.**ControllerUtils.init(**
> ControllerUtils.java:50)
>        at org.carrot2.core.**PoolingProcessingComponentMana**ger$**
> ComponentInstantiationListener**.objectInstantiated(**
> PoolingProcessingComponentMana**ger.java:189)
>        ... 30 more
> Caused by: java.lang.**IllegalArgumentException: Can not set
> org.carrot2.text.linguistic.**ITokenizerFactory field org.carrot2.text.**
> preprocessing.pipeline.**BasicPreprocessingPipeline.**tokenizerFactory to
> java.lang.String
>        at sun.reflect.**UnsafeFieldAccessorImpl.**
> throwSetIllegalArgumentExcepti**on(UnsafeFieldAccessorImpl.**java:146)
>        at sun.reflect.**UnsafeFieldAccessorImpl.**
> throwSetIllegalArgumentExcepti**on(UnsafeFieldAccessorImpl.**java:150)
>        at sun.reflect.**UnsafeObjectFieldAccessorImpl.**set(**
> UnsafeObjectFieldAccessorImpl.**java:63)
>        at java.lang.reflect.Field.set(**Field.java:657)
>        at org.carrot2.util.attribute.**AttributeBinder$**
> AttributeBinderActionBind.**performAction(AttributeBinder.**java:610)
>        ... 37 more
>
>
> I should dig in, but if you have any clue, it would be appreciated. I'm
> using 3.6 branch.
>
>
> koji
> --
> Query Log Visualizer for Apache Solr
> http://soleami.com/
>
> (12/05/20 21:11), Stanislaw Osinski wrote:
>
>> Hi Koji,
>>
>> It's fixed in trunk and 3.6.1 branch now. If you hit any other issues with
>> this, let me know.
>>
>> Staszek
>>
>> On Sun, May 20, 2012 at 1:02 PM, Koji Sekiguchi<koji@r.email.ne.jp>
>>  wrote:
>>
>>  Hi Staszek,
>>>
>>> I'll wait your fix. Thank you!
>>>
>>> Koji Sekiguchi from iPad2
>>>
>>> On 2012/05/20, at 18:18, Stanislaw Osinski<stanislaw@osinski.name**>
>>>  wrote:
>>>
>>>  Hi Koji,
>>>>
>>>> You're right, the current code overwrites the custom tokenizer though it
>>>> shouldn't. LuceneCarrot2TokenizerFactory is there to avoid circular
>>>> dependencies (Carrot2 default tokenizer depends on Lucene), but it
>>>> shouldn't be an issue with custom tokenizers.
>>>>
>>>> I'll try to commit a fix later today. Meanwhile, if you have a chance to
>>>> recompile the code, a temporary solution would be to hardcode your
>>>> tokenizer class into the fragment you pasted:
>>>>
>>>>   BasicPreprocessingPipelineDesc**riptor.attributeBuilder(**
>>>> initAttributes)
>>>>       .stemmerFactory(**LuceneCarrot2StemmerFactory.**class)
>>>>       .tokenizerFactory(**YourCustomTokenizer.class)
>>>>       .lexicalDataFactory(**SolrStopwordsCarrot2LexicalDat**
>>>> aFactory.class);
>>>>
>>>> Staszek
>>>>
>>>> On Sun, May 20, 2012 at 9:40 AM, Koji Sekiguchi<koji@r.email.ne.jp>
>>>>
>>> wrote:
>>>
>>>>
>>>>  Hello,
>>>>>
>>>>> As I'd like to use custom ITokenizerFactory, I set the following
>>>>> Carrot2
>>>>> key
>>>>> in solrconfig.xml:
>>>>>
>>>>> <searchComponent name="clustering"
>>>>>                  enable="${solr.clustering.**enabled:true}"
>>>>>                  class="solr.clustering.**ClusteringComponent">
>>>>>   <lst name="engine">
>>>>>     <str name="name">default</str>
>>>>>        :
>>>>>     <str
>>>>>
>>>>>  name="PreprocessingPipeline.**tokenizerFactory">my.own.**
>>> TokenizerFactory</str>
>>>
>>>>   </lst>
>>>>> </searchComponent>
>>>>>
>>>>> But seems that CarrotClusteringEngine overwrites it with
>>>>> LuceneCarrot2TokenizerFactory
>>>>> in init() method:
>>>>>
>>>>>   BasicPreprocessingPipelineDesc**riptor.attributeBuilder(**
>>>>> initAttributes)
>>>>>       .stemmerFactory(**LuceneCarrot2StemmerFactory.**class)
>>>>>       .tokenizerFactory(**LuceneCarrot2TokenizerFactory.**class)
>>>>>       .lexicalDataFactory(**SolrStopwordsCarrot2LexicalDat**
>>>>> aFactory.class);
>>>>>
>>>>> Am I missing something?
>>>>>
>>>>> koji
>>>>> --
>>>>> Query Log Visualizer for Apache Solr
>>>>> http://soleami.com/
>>>>>
>>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message