jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Furst, Carl" <Carl.Fu...@mlb.com>
Subject Re: jcr sql2 - contains() full text search not working
Date Wed, 11 Jul 2012 18:16:49 GMT
So after some investigation I'm at a loss as to which class to use for
text extraction (ie what to set textFilterClasses to in the workspace.xml
file).  Which class is the default in 2.4.2? The Wiki I think is
incorrect... It states
org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the
default, but I don't see that class in the source code.

Possible candidates are:
Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search
indexer)
Org.apache.jackrabbit.core.query.lucene.BlockingParser
org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField

Any suggestions? I'll plug in the last two and see if things improve.




Thanks,
Carl Furst





On 7/11/12 1:36 PM, "Furst, Carl" <Carl.Furst@mlb.com> wrote:

>2.4.2 - Thanks for the references.. I'll check out Tika and try a test.
>
>Thanks,
>Carl Furst
>
>
>
>
>
>On 7/3/12 5:19 AM, "Alex Parvulescu" <alex.parvulescu@gmail.com> wrote:
>
>>Hi Carl,
>>
>>What version of jackrabbit are you on?
>>
>>Next, are you sure you have the tika extractors in the classpath? maybe
>>you
>>are seeing something along the lines of [0].
>>
>>I would try to isolate the problem by taking tomcat out of the setup.
>>Build
>>a simple test, see how it works then deploy on tomcat and verify.
>>A good place to start is the unit test collection available in jackrabbit
>>core [1].
>>
>>
>>best,
>>alex
>>
>>[0] https://issues.apache.org/jira/browse/JCR-3287
>>[1]
>>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/ja
>>v
>>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=markup
>>
>>
>>On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl <Carl.Furst@mlb.com> wrote:
>>
>>> So given the below I tried to use
>>>
>>> 'inclu*' and 'include*' and still no results so I'm going to start
>>>looking
>>> into perhaps maybe some of these reasons as why:
>>>
>>> 
>>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.
>>>2
>>>BA
>>> C8_incorrect_hits.3F
>>>
>>> Of course it could just be that the parser is not parsing the '*'.
>>>
>>> Thanks again,
>>>
>>>
>>>
>>> Carl Furst
>>>
>>>
>>>
>>>
>>>
>>> On 6/27/12 1:59 PM, "Furst, Carl" <Carl.Furst@mlb.com> wrote:
>>>
>>> >Thanks Torsten,
>>> >
>>> >So even using JQOM would not help here. I'll read up more on lucine
>>>and
>>> >find out more. My main stumbling block here was where the query was
>>>being
>>> >executed. Was it on the Derby level or the Lucine level..
>>> >
>>> >This has cleared that part of it up for me as well.
>>> >
>>> >Thanks again,
>>> >
>>> >Carl Furst
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On 6/27/12 1:50 PM, "Torsten Stolpmann" <stolpmann@verit.de> wrote:
>>> >
>>> >>Hi Carl,
>>> >>
>>> >>per default the underlying Lucene implementation does not match
>>>leading
>>> >>wildcards for performance reasons. See also:
>>> >>
>>> 
>>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_suppo
>>>r
>>> >>t
>>> >>_is_available_from_Lucene.3F
>>> >>
>>> >>So just matching '*' will not work, but eg. 'i*' might give you the
>>> >>results you were looking for.
>>> >>
>>> >>Sadly enough I did not find any reference to this in the JackRabbit
>>> >>documentation.
>>> >>
>>> >>Took me quite a while to find that too.
>>> >>
>>> >>Hope this helps,
>>> >>
>>> >>Torsten
>>> >>
>>> >>On 27.06.2012 17:19, Furst, Carl wrote:
>>> >>> I'm probably missing something here but everything I've read so
far
>>> >>>leads
>>> >>> me to believe this should work..
>>> >>>
>>> >>> I have nodes in a repositoy of type nt:folder and nt:file. nt:file
>>>has
>>> >>>a
>>> >>> child node jcr:content of type nt:resource which has a child
>>>property
>>> >>> called jcr:data
>>> >>>
>>> >>> There are many cases where the jcr:data column has the world
>>>'include'
>>> >>>in
>>> >>> it. They are jsp files so, yes, I know this word exists in several
>>> >>>files.
>>> >>>
>>> >>> So here's the sql I use:
>>> >>>
>>> >>> select * from [nt:resource] where  contains([jcr:data], 'include');
>>> >>>
>>> >>> Here's the sql that is returned from q.getStatement() :
>>> >>>
>>> >>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>>> >>> CONTAINS([nt:resource].[jcr:data], 'include');
>>> >>>
>>> >>> Here is a sample text in jcr:data to search on.
>>> >>>
>>> >>> <%@ include file="..."
>>> >>>
>>> >>>
>>> >>> ... More jsp here..
>>> >>> <%/jsp:include...
>>> >>>
>>> >>> Yet it doesn┬╣t find it. I feel I'm missing something.. Do I need
to
>>>add
>>> >>>a
>>> >>> "searchable" mixin or something?
>>> >>>
>>> >>> Any ideas why this is not being found?
>>> >>>
>>> >>> It used to be that apache had the cdn file for jackrabbit node
>>>types
>>> >>>was
>>> >>> readily available. Does anyone know where I can find the cdn file
>>>for
>>> >>> jackrabbit node types?
>>> >>>
>>> >>> jcr:content is unstructured, but I explicitly make the type
>>>nt:resource
>>> >>> (otherwise the statement would would not be parsed, Query object
>>>would
>>> >>> throw an error, like "table not found," right? Because the type
is
>>>a
>>> >>> table). So the type is right.. The field is right.. The search is
>>>not
>>> >>> working.
>>> >>>
>>> >>>
>>> >>> I'm using Jackrabbit without any special configuration. Just the
>>>war in
>>> >>>a
>>> >>> simple tomcat deployment. So it's sitting on top of Derby and
>>>Lucine.
>>> >>>
>>> >>>
>>> >>> Any help would be appreciated.
>>> >>>
>>> >>> Thanks,
>>> >>>
>>> >>> Carl Furst
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> **********************************************************
>>> >>>
>>> >>> MLB.com: Where Baseball is Always On
>>> >>>
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >**********************************************************
>>> >
>>> >MLB.com: Where Baseball is Always On
>>>
>>>
>>>
>>>
>>>
>>>
>>> **********************************************************
>>>
>>> MLB.com: Where Baseball is Always On
>>>
>
>
>
>
>
>
>**********************************************************
>
>MLB.com: Where Baseball is Always On






**********************************************************

MLB.com: Where Baseball is Always On
Mime
View raw message