jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Furst, Carl" <Carl.Fu...@mlb.com>
Subject Re: jcr sql2 - contains() full text search not working
Date Thu, 12 Jul 2012 22:07:59 GMT
I even tried a simpler query based on findings with Luke

In Luke I did the following:

_\:LOCAL_NAME: mc_art_contest.inc.html


Which is the name of one of the nodes stored.

And Luke reported one record found.

Then I tried 

select * from [nt:file] where name = 'mc_art_contest.inc.html'


In JR and 
found: 0 nodes


Was the result.. The problem is.. I'm not sure where the bug is.. But text
searches are not working with a derby/lucene/Jackrabbit default deploy. I
tried this as a servlet in the same container as the war, I tried this as
an RMI/JCA application…  No luck.. So that is that and its been fun.

Thanks,



Carl Furst






On 7/11/12 4:17 PM, "Furst, Carl" <Carl.Furst@mlb.com> wrote:

>Thanks for the help Torsten,
>
>Unfortunately that didn't work. The output from my test is as follows:
>
>mimetype for node we are looking for is: text/html
>// Which was taken from the node, using the path. This is the text that is
>stored in jcr:mimeType
>
>text for node we are looking for is:
>FanFest Art Contest Winners</b></span><br>
>// this is a snippet of text from the document I was searching stored in
>jcr:data
>
>
>
>
>starting execute
>executing current query with sqlSELECT [nt:resource].* FROM [nt:resource]
>WHERE CONTAINS([nt:resource].[jcr:data], 'FanFest Art Contest') using
>language JCR-SQL2
>//This is the query as extracted from the Query object
>
>And this is the result:
>
>found: 0 nodes
>executed test in 660 ms
>
>
>So something is not right…(SQL, maybe?). Maybe the node iterator isn't
>getting the right count of nodes? Could it be that over RMI it's possible
>to get the nodes but not the right count nodes returned?
>
>Hmmm…. 
>
>
>Carl Furst
>
>
>
>
>
>On 7/11/12 2:32 PM, "Torsten Stolpmann" <stolpmann@verit.de> wrote:
>
>>Hi Carl,
>>
>>AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for
>>details. Maybe [2] has the correct answer to your problem (explicitly
>>setting the jcr:mimeType for your data node)?
>>
>>HTH,
>>
>>Torsten
>>
>>[1] https://issues.apache.org/jira/browse/JCR-1878
>>[2]
>>http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-t
>>o
>>-specify-extractors-td4534050.html
>>
>>On 11.07.2012 20:16, Furst, Carl wrote:
>>> So after some investigation I'm at a loss as to which class to use for
>>> text extraction (ie what to set textFilterClasses to in the
>>>workspace.xml
>>> file).  Which class is the default in 2.4.2? The Wiki I think is
>>> incorrect... It states
>>> org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the
>>> default, but I don't see that class in the source code.
>>> 
>>> Possible candidates are:
>>> Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search
>>> indexer)
>>> Org.apache.jackrabbit.core.query.lucene.BlockingParser
>>> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField
>>> 
>>> Any suggestions? I'll plug in the last two and see if things improve.
>>> 
>>> 
>>> 
>>> 
>>> Thanks,
>>> Carl Furst
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 7/11/12 1:36 PM, "Furst, Carl"<Carl.Furst@mlb.com>  wrote:
>>> 
>>>> 2.4.2 - Thanks for the references.. I'll check out Tika and try a
>>>>test.
>>>>
>>>> Thanks,
>>>> Carl Furst
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 7/3/12 5:19 AM, "Alex Parvulescu"<alex.parvulescu@gmail.com>
>>>>wrote:
>>>>
>>>>> Hi Carl,
>>>>>
>>>>> What version of jackrabbit are you on?
>>>>>
>>>>> Next, are you sure you have the tika extractors in the classpath?
>>>>>maybe
>>>>> you
>>>>> are seeing something along the lines of [0].
>>>>>
>>>>> I would try to isolate the problem by taking tomcat out of the setup.
>>>>> Build
>>>>> a simple test, see how it works then deploy on tomcat and verify.
>>>>> A good place to start is the unit test collection available in
>>>>>jackrabbit
>>>>> core [1].
>>>>>
>>>>>
>>>>> best,
>>>>> alex
>>>>>
>>>>> [0] https://issues.apache.org/jira/browse/JCR-3287
>>>>> [1]
>>>>> 
>>>>>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test
>>>>>/
>>>>>ja
>>>>> v
>>>>> 
>>>>>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=mar
>>>>>k
>>>>>up
>>>>>
>>>>>
>>>>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<Carl.Furst@mlb.com>
>>>>>wrote:
>>>>>
>>>>>> So given the below I tried to use
>>>>>>
>>>>>> 'inclu*' and 'include*' and still no results so I'm going to start
>>>>>> looking
>>>>>> into perhaps maybe some of these reasons as why:
>>>>>>
>>>>>>
>>>>>> 
>>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hit
>>>>>>s
>>>>>>_.
>>>>>> 2
>>>>>> BA
>>>>>> C8_incorrect_hits.3F
>>>>>>
>>>>>> Of course it could just be that the parser is not parsing the '*'.
>>>>>>
>>>>>> Thanks again,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Carl Furst
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/27/12 1:59 PM, "Furst, Carl"<Carl.Furst@mlb.com>  wrote:
>>>>>>
>>>>>>> Thanks Torsten,
>>>>>>>
>>>>>>> So even using JQOM would not help here. I'll read up more on
lucine
>>>>>> and
>>>>>>> find out more. My main stumbling block here was where the query
was
>>>>>> being
>>>>>>> executed. Was it on the Derby level or the Lucine level..
>>>>>>>
>>>>>>> This has cleared that part of it up for me as well.
>>>>>>>
>>>>>>> Thanks again,
>>>>>>>
>>>>>>> Carl Furst
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<stolpmann@verit.de>
 wrote:
>>>>>>>
>>>>>>>> Hi Carl,
>>>>>>>>
>>>>>>>> per default the underlying Lucene implementation does not
match
>>>>>> leading
>>>>>>>> wildcards for performance reasons. See also:
>>>>>>>>
>>>>>>
>>>>>> 
>>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_su
>>>>>>p
>>>>>>po
>>>>>> r
>>>>>>>> t
>>>>>>>> _is_available_from_Lucene.3F
>>>>>>>>
>>>>>>>> So just matching '*' will not work, but eg. 'i*' might give
you
>>>>>>>>the
>>>>>>>> results you were looking for.
>>>>>>>>
>>>>>>>> Sadly enough I did not find any reference to this in the
>>>>>>>>JackRabbit
>>>>>>>> documentation.
>>>>>>>>
>>>>>>>> Took me quite a while to find that too.
>>>>>>>>
>>>>>>>> Hope this helps,
>>>>>>>>
>>>>>>>> Torsten
>>>>>>>>
>>>>>>>> On 27.06.2012 17:19, Furst, Carl wrote:
>>>>>>>>> I'm probably missing something here but everything I've
read so
>>>>>>>>>far
>>>>>>>>> leads
>>>>>>>>> me to believe this should work..
>>>>>>>>>
>>>>>>>>> I have nodes in a repositoy of type nt:folder and nt:file.
>>>>>>>>>nt:file
>>>>>> has
>>>>>>>>> a
>>>>>>>>> child node jcr:content of type nt:resource which has
a child
>>>>>> property
>>>>>>>>> called jcr:data
>>>>>>>>>
>>>>>>>>> There are many cases where the jcr:data column has the
world
>>>>>> 'include'
>>>>>>>>> in
>>>>>>>>> it. They are jsp files so, yes, I know this word exists
in
>>>>>>>>>several
>>>>>>>>> files.
>>>>>>>>>
>>>>>>>>> So here's the sql I use:
>>>>>>>>>
>>>>>>>>> select * from [nt:resource] where  contains([jcr:data],
>>>>>>>>>'include');
>>>>>>>>>
>>>>>>>>> Here's the sql that is returned from q.getStatement()
:
>>>>>>>>>
>>>>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>>>>>>>>> CONTAINS([nt:resource].[jcr:data], 'include');
>>>>>>>>>
>>>>>>>>> Here is a sample text in jcr:data to search on.
>>>>>>>>>
>>>>>>>>> <%@ include file="..."
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ... More jsp here..
>>>>>>>>> <%/jsp:include...
>>>>>>>>>
>>>>>>>>> Yet it doesn¹t find it. I feel I'm missing something..
Do I need
>>>>>>>>>to
>>>>>> add
>>>>>>>>> a
>>>>>>>>> "searchable" mixin or something?
>>>>>>>>>
>>>>>>>>> Any ideas why this is not being found?
>>>>>>>>>
>>>>>>>>> It used to be that apache had the cdn file for jackrabbit
node
>>>>>> types
>>>>>>>>> was
>>>>>>>>> readily available. Does anyone know where I can find
the cdn file
>>>>>> for
>>>>>>>>> jackrabbit node types?
>>>>>>>>>
>>>>>>>>> jcr:content is unstructured, but I explicitly make the
type
>>>>>> nt:resource
>>>>>>>>> (otherwise the statement would would not be parsed, Query
object
>>>>>> would
>>>>>>>>> throw an error, like "table not found," right? Because
the type
>>>>>>>>>is
>>>>>> a
>>>>>>>>> table). So the type is right.. The field is right.. The
search is
>>>>>> not
>>>>>>>>> working.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm using Jackrabbit without any special configuration.
Just the
>>>>>> war in
>>>>>>>>> a
>>>>>>>>> simple tomcat deployment. So it's sitting on top of Derby
and
>>>>>> Lucine.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any help would be appreciated.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Carl Furst
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> **********************************************************
>>>>>>>>>
>>>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> **********************************************************
>>>>>>>
>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> **********************************************************
>>>>>>
>>>>>> MLB.com: Where Baseball is Always On
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> **********************************************************
>>>>
>>>> MLB.com: Where Baseball is Always On
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> **********************************************************
>>> 
>>> MLB.com: Where Baseball is Always On
>>
>
>
>
>
>
>
>**********************************************************
>
>MLB.com: Where Baseball is Always On






**********************************************************

MLB.com: Where Baseball is Always On
Mime
View raw message