jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Furst, Carl" <Carl.Fu...@mlb.com>
Subject Re: jcr sql2 - contains() full text search not working
Date Wed, 11 Jul 2012 20:17:17 GMT
Thanks for the help Torsten,

Unfortunately that didn't work. The output from my test is as follows:

mimetype for node we are looking for is: text/html
// Which was taken from the node, using the path. This is the text that is
stored in jcr:mimeType

text for node we are looking for is:
FanFest Art Contest Winners</b></span><br>
// this is a snippet of text from the document I was searching stored in
jcr:data




starting execute
executing current query with sqlSELECT [nt:resource].* FROM [nt:resource]
WHERE CONTAINS([nt:resource].[jcr:data], 'FanFest Art Contest') using
language JCR-SQL2
//This is the query as extracted from the Query object

And this is the result:

found: 0 nodes
executed test in 660 ms


So something is not right…(SQL, maybe?). Maybe the node iterator isn't
getting the right count of nodes? Could it be that over RMI it's possible
to get the nodes but not the right count nodes returned?

Hmmm…. 


Carl Furst





On 7/11/12 2:32 PM, "Torsten Stolpmann" <stolpmann@verit.de> wrote:

>Hi Carl,
>
>AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for
>details. Maybe [2] has the correct answer to your problem (explicitly
>setting the jcr:mimeType for your data node)?
>
>HTH,
>
>Torsten
>
>[1] https://issues.apache.org/jira/browse/JCR-1878
>[2]
>http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-to
>-specify-extractors-td4534050.html
>
>On 11.07.2012 20:16, Furst, Carl wrote:
>> So after some investigation I'm at a loss as to which class to use for
>> text extraction (ie what to set textFilterClasses to in the
>>workspace.xml
>> file).  Which class is the default in 2.4.2? The Wiki I think is
>> incorrect... It states
>> org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the
>> default, but I don't see that class in the source code.
>> 
>> Possible candidates are:
>> Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search
>> indexer)
>> Org.apache.jackrabbit.core.query.lucene.BlockingParser
>> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField
>> 
>> Any suggestions? I'll plug in the last two and see if things improve.
>> 
>> 
>> 
>> 
>> Thanks,
>> Carl Furst
>> 
>> 
>> 
>> 
>> 
>> On 7/11/12 1:36 PM, "Furst, Carl"<Carl.Furst@mlb.com>  wrote:
>> 
>>> 2.4.2 - Thanks for the references.. I'll check out Tika and try a test.
>>>
>>> Thanks,
>>> Carl Furst
>>>
>>>
>>>
>>>
>>>
>>> On 7/3/12 5:19 AM, "Alex Parvulescu"<alex.parvulescu@gmail.com>  wrote:
>>>
>>>> Hi Carl,
>>>>
>>>> What version of jackrabbit are you on?
>>>>
>>>> Next, are you sure you have the tika extractors in the classpath?
>>>>maybe
>>>> you
>>>> are seeing something along the lines of [0].
>>>>
>>>> I would try to isolate the problem by taking tomcat out of the setup.
>>>> Build
>>>> a simple test, see how it works then deploy on tomcat and verify.
>>>> A good place to start is the unit test collection available in
>>>>jackrabbit
>>>> core [1].
>>>>
>>>>
>>>> best,
>>>> alex
>>>>
>>>> [0] https://issues.apache.org/jira/browse/JCR-3287
>>>> [1]
>>>> 
>>>>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/
>>>>ja
>>>> v
>>>> 
>>>>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=mark
>>>>up
>>>>
>>>>
>>>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<Carl.Furst@mlb.com>
>>>>wrote:
>>>>
>>>>> So given the below I tried to use
>>>>>
>>>>> 'inclu*' and 'include*' and still no results so I'm going to start
>>>>> looking
>>>>> into perhaps maybe some of these reasons as why:
>>>>>
>>>>>
>>>>> 
>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits
>>>>>_.
>>>>> 2
>>>>> BA
>>>>> C8_incorrect_hits.3F
>>>>>
>>>>> Of course it could just be that the parser is not parsing the '*'.
>>>>>
>>>>> Thanks again,
>>>>>
>>>>>
>>>>>
>>>>> Carl Furst
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 6/27/12 1:59 PM, "Furst, Carl"<Carl.Furst@mlb.com>  wrote:
>>>>>
>>>>>> Thanks Torsten,
>>>>>>
>>>>>> So even using JQOM would not help here. I'll read up more on lucine
>>>>> and
>>>>>> find out more. My main stumbling block here was where the query was
>>>>> being
>>>>>> executed. Was it on the Derby level or the Lucine level..
>>>>>>
>>>>>> This has cleared that part of it up for me as well.
>>>>>>
>>>>>> Thanks again,
>>>>>>
>>>>>> Carl Furst
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<stolpmann@verit.de>
 wrote:
>>>>>>
>>>>>>> Hi Carl,
>>>>>>>
>>>>>>> per default the underlying Lucene implementation does not match
>>>>> leading
>>>>>>> wildcards for performance reasons. See also:
>>>>>>>
>>>>>
>>>>> 
>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_sup
>>>>>po
>>>>> r
>>>>>>> t
>>>>>>> _is_available_from_Lucene.3F
>>>>>>>
>>>>>>> So just matching '*' will not work, but eg. 'i*' might give you
the
>>>>>>> results you were looking for.
>>>>>>>
>>>>>>> Sadly enough I did not find any reference to this in the JackRabbit
>>>>>>> documentation.
>>>>>>>
>>>>>>> Took me quite a while to find that too.
>>>>>>>
>>>>>>> Hope this helps,
>>>>>>>
>>>>>>> Torsten
>>>>>>>
>>>>>>> On 27.06.2012 17:19, Furst, Carl wrote:
>>>>>>>> I'm probably missing something here but everything I've read
so
>>>>>>>>far
>>>>>>>> leads
>>>>>>>> me to believe this should work..
>>>>>>>>
>>>>>>>> I have nodes in a repositoy of type nt:folder and nt:file.
nt:file
>>>>> has
>>>>>>>> a
>>>>>>>> child node jcr:content of type nt:resource which has a child
>>>>> property
>>>>>>>> called jcr:data
>>>>>>>>
>>>>>>>> There are many cases where the jcr:data column has the world
>>>>> 'include'
>>>>>>>> in
>>>>>>>> it. They are jsp files so, yes, I know this word exists in
several
>>>>>>>> files.
>>>>>>>>
>>>>>>>> So here's the sql I use:
>>>>>>>>
>>>>>>>> select * from [nt:resource] where  contains([jcr:data],
>>>>>>>>'include');
>>>>>>>>
>>>>>>>> Here's the sql that is returned from q.getStatement() :
>>>>>>>>
>>>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE
>>>>>>>> CONTAINS([nt:resource].[jcr:data], 'include');
>>>>>>>>
>>>>>>>> Here is a sample text in jcr:data to search on.
>>>>>>>>
>>>>>>>> <%@ include file="..."
>>>>>>>>
>>>>>>>>
>>>>>>>> ... More jsp here..
>>>>>>>> <%/jsp:include...
>>>>>>>>
>>>>>>>> Yet it doesn¹t find it. I feel I'm missing something.. Do
I need
>>>>>>>>to
>>>>> add
>>>>>>>> a
>>>>>>>> "searchable" mixin or something?
>>>>>>>>
>>>>>>>> Any ideas why this is not being found?
>>>>>>>>
>>>>>>>> It used to be that apache had the cdn file for jackrabbit
node
>>>>> types
>>>>>>>> was
>>>>>>>> readily available. Does anyone know where I can find the
cdn file
>>>>> for
>>>>>>>> jackrabbit node types?
>>>>>>>>
>>>>>>>> jcr:content is unstructured, but I explicitly make the type
>>>>> nt:resource
>>>>>>>> (otherwise the statement would would not be parsed, Query
object
>>>>> would
>>>>>>>> throw an error, like "table not found," right? Because the
type is
>>>>> a
>>>>>>>> table). So the type is right.. The field is right.. The search
is
>>>>> not
>>>>>>>> working.
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm using Jackrabbit without any special configuration. Just
the
>>>>> war in
>>>>>>>> a
>>>>>>>> simple tomcat deployment. So it's sitting on top of Derby
and
>>>>> Lucine.
>>>>>>>>
>>>>>>>>
>>>>>>>> Any help would be appreciated.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Carl Furst
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> **********************************************************
>>>>>>>>
>>>>>>>> MLB.com: Where Baseball is Always On
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> **********************************************************
>>>>>>
>>>>>> MLB.com: Where Baseball is Always On
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> **********************************************************
>>>>>
>>>>> MLB.com: Where Baseball is Always On
>>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> **********************************************************
>>>
>>> MLB.com: Where Baseball is Always On
>> 
>> 
>> 
>> 
>> 
>> 
>> **********************************************************
>> 
>> MLB.com: Where Baseball is Always On
>






**********************************************************

MLB.com: Where Baseball is Always On
Mime
View raw message