jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Furst, Carl" <Carl.Fu...@mlb.com>
Subject Re: jcr sql2 - contains() full text search not working
Date Tue, 17 Jul 2012 19:56:16 GMT
Tried the same query using xpath:

//mc_art_contest.inc.html

Worked! 

Just FYI.


Carl Furst





On 7/12/12 6:07 PM, "Furst, Carl" <Carl.Furst@mlb.com> wrote:

>I even tried a simpler query based on findings with Luke
>
>In Luke I did the following:
>
>_\:LOCAL_NAME: mc_art_contest.inc.html
>
>
>Which is the name of one of the nodes stored.
>
>And Luke reported one record found.
>
>Then I tried 
>
>select * from [nt:file] where name = 'mc_art_contest.inc.html'
>
>
>In JR and 
>found: 0 nodes
>
>
>Was the result.. The problem is.. I'm not sure where the bug is.. But text
>searches are not working with a derby/lucene/Jackrabbit default deploy. I
>tried this as a servlet in the same container as the war, I tried this as
>an RMI/JCA application…  No luck.. So that is that and its been fun.
>
>Thanks,
>
>
>
>Carl Furst
>
>
>
>
>
>
>On 7/11/12 4:17 PM, "Furst, Carl" <Carl.Furst@mlb.com> wrote:
>
>>Thanks for the help Torsten,
>>
>>Unfortunately that didn't work. The output from my test is as follows:
>>
>>mimetype for node we are looking for is: text/html
>>// Which was taken from the node, using the path. This is the text that
>>is
>>stored in jcr:mimeType
>>
>>text for node we are looking for is:
>>FanFest Art Contest Winners</b></span><br>
>>// this is a snippet of text from the document I was searching stored in
>>jcr:data
>>
>>
>>
>>
>>starting execute
>>executing current query with sqlSELECT [nt:resource].* FROM [nt:resource]
>>WHERE CONTAINS([nt:resource].[jcr:data], 'FanFest Art Contest') using
>>language JCR-SQL2
>>//This is the query as extracted from the Query object
>>
>>And this is the result:
>>
>>found: 0 nodes
>>executed test in 660 ms
>>
>>
>>So something is not right…(SQL, maybe?). Maybe the node iterator isn't
>>getting the right count of nodes? Could it be that over RMI it's possible
>>to get the nodes but not the right count nodes returned?
>>
>>Hmmm…. 
>>
>>
>>Carl Furst
>>
>>
>>
>>
>>
>>On 7/11/12 2:32 PM, "Torsten Stolpmann" <stolpmann@verit.de> wrote:
>>
>>>Hi Carl,
>>>
>>>AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for
>>>details. Maybe [2] has the correct answer to your problem (explicitly
>>>setting the jcr:mimeType for your data node)?
>>>
>>>HTH,
>>>
>>>Torsten
>>>
>>>[1] https://issues.apache.org/jira/browse/JCR-1878
>>>[2]
>>>http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-
>>>t
>>>o
>>>-specify-extractors-td4534050.html
>>>
>>>On 11.07.2012 20:16, Furst, Carl wrote:
>>>> So after some investigation I'm at a loss as to which class to use for
>>>> text extraction (ie what to set textFilterClasses to in the
>>>>workspace.xml
>>>> file).  Which class is the default in 2.4.2? The Wiki I think is
>>>> incorrect... It states
>>>> org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the
>>>> default, but I don't see that class in the source code.
>>>> 
>>>> Possible candidates are:
>>>> Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search
>>>> indexer)
>>>> Org.apache.jackrabbit.core.query.lucene.BlockingParser
>>>> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField
>>>> 
>>>> Any suggestions? I'll plug in the last two and see if things improve.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Carl Furst
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 7/11/12 1:36 PM, "Furst, Carl"<Carl.Furst@mlb.com>  wrote:
>>>> 
>>>>> 2.4.2 - Thanks for the references.. I'll check out Tika and try a
>>>>>test.
>>>>>
>>>>> Thanks,
>>>>> Carl Furst
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 7/3/12 5:19 AM, "Alex Parvulescu"<alex.parvulescu@gmail.com>
>>>>>wrote:
>>>>>
>>>>>> Hi Carl,
>>>>>>
>>>>>> What version of jackrabbit are you on?
>>>>>>
>>>>>> Next, are you sure you have the tika extractors in the classpath?
>>>>>>maybe
>>>>>> you
>>>>>> are seeing something along the lines of [0].
>>>>>>
>>>>>> I would try to isolate the problem by taking tomcat out of the
>>>>>>setup.
>>>>>> Build
>>>>>> a simple test, see how it works then deploy on tomcat and verify.
>>>>>> A good place to start is the unit test collection available in
>>>>>>jackrabbit
>>>>>> core [1].
>>>>>>
>>>>>>
>>>>>> best,
>>>>>> alex
>>>>>>
>>>>>> [0] https://issues.apache.org/jira/browse/JCR-3287
>>>>>> [1]
>>>>>> 
>>>>>>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/tes
>>>>>>t
>>>>>>/
>>>>>>ja
>>>>>> v
>>>>>> 
>>>>>>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=ma
>>>>>>r
>>>>>>k
>>>>>>up
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<Carl.Furst@mlb.com>
>>>>>>wrote:
>>>>>>
>>>>>>> So given the below I tried to use
>>>>>>>
>>>>>>> 'inclu*' and 'include*' and still no results so I'm going to
start
>>>>>>> looking
>>>>>>> into perhaps maybe some of these reasons as why:
>>>>>>>
>>>>>>>
>>>>>>> 
>>>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hi
>>>>>>>t
>>>>>>>s
>>>>>>>_.
>>>>>>> 2
>>>>>>> BA
>>>>>>> C8_incorrect_hits.3F
>>>>>>>
>>>>>>> Of course it could just be that the parser is not parsing the
'*'.
>>>>>>>
>>>>>>> Thanks again,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Carl Furst
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 6/27/12 1:59 PM, "Furst, Carl"<Carl.Furst@mlb.com> 
wrote:
>>>>>>>
>>>>>>>> Thanks Torsten,
>>>>>>>>
>>>>>>>> So even using JQOM would not help here. I'll read up more
on
>>>>>>>>lucine
>>>>>>> and
>>>>>>>> find out more. My main stumbling block here was where the
query
>>>>>>>>was
>>>>>>> being
>>>>>>>> executed. Was it on the Derby level or the Lucine level..
>>>>>>>>
>>>>>>>> This has cleared that part of it up for me as well.
>>>>>>>>
>>>>>>>> Thanks again,
>>>>>>>>
>>>>>>>> Carl Furst
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<stolpmann@verit.de>
>>>>>>>>wrote:
>>>>>>>>
>>>>>>>>> Hi Carl,
>>>>>>>>>
>>>>>>>>> per default the underlying Lucene implementation does
not match
>>>>>>> leading
>>>>>>>>> wildcards for performance reasons. See also:
>>>>>>>>>
>>>>>>>
>>>>>>> 
>>>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_s
>>>>>>>u
>>>>>>>p
>>>>>>>po
>>>>>>> r
>>>>>>>>> t
>>>>>>>>> _is_available_from_Lucene.3F
>>>>>>>>>
>>>>>>>>> So just matching '*' will not work, but eg. 'i*' might
give you
>>>>>>>>>the
>>>>>>>>> results you were looking for.
>>>>>>>>>
>>>>>>>>> Sadly enough I did not find any reference to this in
the
>>>>>>>>>JackRabbit
>>>>>>>>> documentation.
>>>>>>>>>
>>>>>>>>> Took me quite a while to find that too.
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Torsten
>>>>>>>>>
>>>>>>>>> On 27.06.2012 17:19, Furst, Carl wrote:
>>>>>>>>>> I'm probably missing something here but everything
I've read so
>>>>>>>>>>far
>>>>>>>>>> leads
>>>>>>>>>> me to believe this should work..
>>>>>>>>>>
>>>>>>>>>> I have nodes in a repositoy of type nt:folder and
nt:file.
>>>>>>>>>>nt:file
>>>>>>> has
>>>>>>>>>> a
>>>>>>>>>> child node jcr:content of type nt:resource which
has a child
>>>>>>> property
>>>>>>>>>> called jcr:data
>>>>>>>>>>
>>>>>>>>>> There are many cases where the jcr:data column has
the world
>>>>>>> 'include'
>>>>>>>>>> in
>>>>>>>>>> it. They are jsp files so, yes, I know this word
exists in
>>>>>>>>>>several
>>>>>>>>>> files.
>>>>>>>>>>
>>>>>>>>>> So here's the sql I use:
>>>>>>>>>>
>>>>>>>>>> select * from [nt:resource] where  contains([jcr:data],
>>>>>>>>>>'include');
>>>>>>>>>>
>>>>>>>>>> Here's the sql that is returned from q.getStatement()
:
>>>>>>>>>>
>>>>>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>>>>>>>>>> CONTAINS([nt:resource].[jcr:data], 'include');
>>>>>>>>>>
>>>>>>>>>> Here is a sample text in jcr:data to search on.
>>>>>>>>>>
>>>>>>>>>> <%@ include file="..."
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ... More jsp here..
>>>>>>>>>> <%/jsp:include...
>>>>>>>>>>
>>>>>>>>>> Yet it doesn¹t find it. I feel I'm missing something..
Do I need
>>>>>>>>>>to
>>>>>>> add
>>>>>>>>>> a
>>>>>>>>>> "searchable" mixin or something?
>>>>>>>>>>
>>>>>>>>>> Any ideas why this is not being found?
>>>>>>>>>>
>>>>>>>>>> It used to be that apache had the cdn file for jackrabbit
node
>>>>>>> types
>>>>>>>>>> was
>>>>>>>>>> readily available. Does anyone know where I can find
the cdn
>>>>>>>>>>file
>>>>>>> for
>>>>>>>>>> jackrabbit node types?
>>>>>>>>>>
>>>>>>>>>> jcr:content is unstructured, but I explicitly make
the type
>>>>>>> nt:resource
>>>>>>>>>> (otherwise the statement would would not be parsed,
Query object
>>>>>>> would
>>>>>>>>>> throw an error, like "table not found," right? Because
the type
>>>>>>>>>>is
>>>>>>> a
>>>>>>>>>> table). So the type is right.. The field is right..
The search
>>>>>>>>>>is
>>>>>>> not
>>>>>>>>>> working.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'm using Jackrabbit without any special configuration.
Just the
>>>>>>> war in
>>>>>>>>>> a
>>>>>>>>>> simple tomcat deployment. So it's sitting on top
of Derby and
>>>>>>> Lucine.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Any help would be appreciated.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Carl Furst
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> **********************************************************
>>>>>>>>>>
>>>>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> **********************************************************
>>>>>>>>
>>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> **********************************************************
>>>>>>>
>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> **********************************************************
>>>>>
>>>>> MLB.com: Where Baseball is Always On
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> **********************************************************
>>>> 
>>>> MLB.com: Where Baseball is Always On
>>>
>>
>>
>>
>>
>>
>>
>>**********************************************************
>>
>>MLB.com: Where Baseball is Always On
>
>
>
>
>
>
>**********************************************************
>
>MLB.com: Where Baseball is Always On






**********************************************************

MLB.com: Where Baseball is Always On
Mime
View raw message