jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torsten Stolpmann <stolpm...@verit.de>
Subject Re: jcr sql2 - contains() full text search not working
Date Wed, 11 Jul 2012 18:32:41 GMT
Hi Carl,

AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for
details. Maybe [2] has the correct answer to your problem (explicitly
setting the jcr:mimeType for your data node)?

HTH,

Torsten

[1] https://issues.apache.org/jira/browse/JCR-1878
[2]
http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-to-specify-extractors-td4534050.html

On 11.07.2012 20:16, Furst, Carl wrote:
> So after some investigation I'm at a loss as to which class to use for
> text extraction (ie what to set textFilterClasses to in the workspace.xml
> file).  Which class is the default in 2.4.2? The Wiki I think is
> incorrect... It states
> org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the
> default, but I don't see that class in the source code.
> 
> Possible candidates are:
> Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search
> indexer)
> Org.apache.jackrabbit.core.query.lucene.BlockingParser
> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField
> 
> Any suggestions? I'll plug in the last two and see if things improve.
> 
> 
> 
> 
> Thanks,
> Carl Furst
> 
> 
> 
> 
> 
> On 7/11/12 1:36 PM, "Furst, Carl"<Carl.Furst@mlb.com>  wrote:
> 
>> 2.4.2 - Thanks for the references.. I'll check out Tika and try a test.
>>
>> Thanks,
>> Carl Furst
>>
>>
>>
>>
>>
>> On 7/3/12 5:19 AM, "Alex Parvulescu"<alex.parvulescu@gmail.com>  wrote:
>>
>>> Hi Carl,
>>>
>>> What version of jackrabbit are you on?
>>>
>>> Next, are you sure you have the tika extractors in the classpath? maybe
>>> you
>>> are seeing something along the lines of [0].
>>>
>>> I would try to isolate the problem by taking tomcat out of the setup.
>>> Build
>>> a simple test, see how it works then deploy on tomcat and verify.
>>> A good place to start is the unit test collection available in jackrabbit
>>> core [1].
>>>
>>>
>>> best,
>>> alex
>>>
>>> [0] https://issues.apache.org/jira/browse/JCR-3287
>>> [1]
>>> http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/ja
>>> v
>>> a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=markup
>>>
>>>
>>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<Carl.Furst@mlb.com>  wrote:
>>>
>>>> So given the below I tried to use
>>>>
>>>> 'inclu*' and 'include*' and still no results so I'm going to start
>>>> looking
>>>> into perhaps maybe some of these reasons as why:
>>>>
>>>>
>>>> https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.
>>>> 2
>>>> BA
>>>> C8_incorrect_hits.3F
>>>>
>>>> Of course it could just be that the parser is not parsing the '*'.
>>>>
>>>> Thanks again,
>>>>
>>>>
>>>>
>>>> Carl Furst
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 6/27/12 1:59 PM, "Furst, Carl"<Carl.Furst@mlb.com>  wrote:
>>>>
>>>>> Thanks Torsten,
>>>>>
>>>>> So even using JQOM would not help here. I'll read up more on lucine
>>>> and
>>>>> find out more. My main stumbling block here was where the query was
>>>> being
>>>>> executed. Was it on the Derby level or the Lucine level..
>>>>>
>>>>> This has cleared that part of it up for me as well.
>>>>>
>>>>> Thanks again,
>>>>>
>>>>> Carl Furst
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<stolpmann@verit.de>  wrote:
>>>>>
>>>>>> Hi Carl,
>>>>>>
>>>>>> per default the underlying Lucene implementation does not match
>>>> leading
>>>>>> wildcards for performance reasons. See also:
>>>>>>
>>>>
>>>> https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_suppo
>>>> r
>>>>>> t
>>>>>> _is_available_from_Lucene.3F
>>>>>>
>>>>>> So just matching '*' will not work, but eg. 'i*' might give you the
>>>>>> results you were looking for.
>>>>>>
>>>>>> Sadly enough I did not find any reference to this in the JackRabbit
>>>>>> documentation.
>>>>>>
>>>>>> Took me quite a while to find that too.
>>>>>>
>>>>>> Hope this helps,
>>>>>>
>>>>>> Torsten
>>>>>>
>>>>>> On 27.06.2012 17:19, Furst, Carl wrote:
>>>>>>> I'm probably missing something here but everything I've read
so far
>>>>>>> leads
>>>>>>> me to believe this should work..
>>>>>>>
>>>>>>> I have nodes in a repositoy of type nt:folder and nt:file. nt:file
>>>> has
>>>>>>> a
>>>>>>> child node jcr:content of type nt:resource which has a child
>>>> property
>>>>>>> called jcr:data
>>>>>>>
>>>>>>> There are many cases where the jcr:data column has the world
>>>> 'include'
>>>>>>> in
>>>>>>> it. They are jsp files so, yes, I know this word exists in several
>>>>>>> files.
>>>>>>>
>>>>>>> So here's the sql I use:
>>>>>>>
>>>>>>> select * from [nt:resource] where  contains([jcr:data], 'include');
>>>>>>>
>>>>>>> Here's the sql that is returned from q.getStatement() :
>>>>>>>
>>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>>>>>>> CONTAINS([nt:resource].[jcr:data], 'include');
>>>>>>>
>>>>>>> Here is a sample text in jcr:data to search on.
>>>>>>>
>>>>>>> <%@ include file="..."
>>>>>>>
>>>>>>>
>>>>>>> ... More jsp here..
>>>>>>> <%/jsp:include...
>>>>>>>
>>>>>>> Yet it doesn┬╣t find it. I feel I'm missing something.. Do I
need to
>>>> add
>>>>>>> a
>>>>>>> "searchable" mixin or something?
>>>>>>>
>>>>>>> Any ideas why this is not being found?
>>>>>>>
>>>>>>> It used to be that apache had the cdn file for jackrabbit node
>>>> types
>>>>>>> was
>>>>>>> readily available. Does anyone know where I can find the cdn
file
>>>> for
>>>>>>> jackrabbit node types?
>>>>>>>
>>>>>>> jcr:content is unstructured, but I explicitly make the type
>>>> nt:resource
>>>>>>> (otherwise the statement would would not be parsed, Query object
>>>> would
>>>>>>> throw an error, like "table not found," right? Because the type
is
>>>> a
>>>>>>> table). So the type is right.. The field is right.. The search
is
>>>> not
>>>>>>> working.
>>>>>>>
>>>>>>>
>>>>>>> I'm using Jackrabbit without any special configuration. Just
the
>>>> war in
>>>>>>> a
>>>>>>> simple tomcat deployment. So it's sitting on top of Derby and
>>>> Lucine.
>>>>>>>
>>>>>>>
>>>>>>> Any help would be appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Carl Furst
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> **********************************************************
>>>>>>>
>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> **********************************************************
>>>>>
>>>>> MLB.com: Where Baseball is Always On
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> **********************************************************
>>>>
>>>> MLB.com: Where Baseball is Always On
>>>>
>>
>>
>>
>>
>>
>>
>> **********************************************************
>>
>> MLB.com: Where Baseball is Always On
> 
> 
> 
> 
> 
> 
> **********************************************************
> 
> MLB.com: Where Baseball is Always On


Mime
View raw message