jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "H. Wilson" <wils...@randdss.com>
Subject Re: Problems with hyphen in JSR-170 XPath query using jcr:contains
Date Fri, 27 Aug 2010 14:35:16 GMT
  Chris,

I think I can answer this one, (I'm sure Ard will confirm), but back 
when I was trying to get this working, one of things I saw was on this page:

http://wiki.apache.org/jackrabbit/IndexingConfiguration

...near the bottom it talks about setting Analyzers for properties in 
the indexing_configuration. I think what it is getting at is, since you 
need it on all properties, you might not need the indexingConfig, and 
you can just add the line:

<param name="analyzer" 
value="org.apache.lucene.analysis.WhitespaceAnalyzer"/>

to your SearchIndex targets in your repository.xml, modifying the 
Analyzer to the one which suites you.

H. Wilson


On 08/27/2010 08:27 AM, Dunstall, Christopher wrote:
> Ard,
>
> In indexing_configuration.xml, where you named the property where the
> analyzer is used (e.g. FullName), how to I set it so that it's used on all
> properties of a node?  As previously said, I'm using jcr:contains because I
> need to search all parts of the node, so the analyzer needs to have effect
> on all properties.
>
> Regards,
>
> Chris
>
>
> On 27/08/10 2:22 AM, "H. Wilson"<wilsonh@randdss.com>  wrote:
>
>>    Finally! I have been hacking away at this here and there for months,
>> trying all different analyzers or not-using analyzers and modifying my
>> queries all to no avail! Since I always like precise examples when I am
>> searching forums, I will post my (nearly) exact solution both for others
>> and so that Ard might verify that this was indeed what he meant.
>>
>> Ard, I was hoping you could embellish a little on why we would duplicate
>> the property? (I didn't actually do it to get this working perfectly)
>> You lost me a little there, was it for efficiency? Thanks for everything!
>>
>> H. Wilson
>>
>> repository.xml (modified both SearchIndex tags to include an
>> indexingConfiguration):
>>
>>      <SearchIndex
>>      class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>
>>          ....
>>          <param name="indexingConfiguration"
>>          value="${rep.home}/indexing_configuration.xml"/>
>>
>>      </SearchIndex>
>>
>>
>> indexing_configuration.xml:
>>
>>      <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
>>      <analyzers>
>>      <analyzer
>>      class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer">
>>      <property>fullName</property>
>>      </analyzer>
>>      </analyzers>
>>      </configuration>
>>
>>
>> LowerCaseKeywordAnalyzer.java:
>>
>>      package org.mycompany.lucene.analysis;
>>           import java.io.Reader;
>>           import org.apache.lucene.analysis.KeywordAnalyzer;
>>           import org.apache.lucene.analysis.LowerCaseFilter;
>>           import org.apache.lucene.analysis.TokenStream;
>>
>>      public class LowerCaseKeywordAnalyzer extends KeywordAnalyzer {
>>
>>           public TokenStream tokenStream ( String field, final Reader
>>      reader  ) {
>>               TokenStream keywordTokenStream = super.tokenStream (field,
>>      reader);
>>               return ( new LowerCaseFilter ( keywordTokenStream ) );
>>           }
>>      }
>>
>>
>> Our search class has a method which then does the following:
>>
>>      public OurParameter[] getOurParameters (String searchTerm, String
>>      srchField ) { //srchField in this case was fullName
>>
>>          TransientRepository repository = new TransientRepository (
>>          OUR_REPO_CONFIG, OUR_REPO_LOCATION);
>>          Session session = repository.login ();
>>          List<Class>  classes = new ArrayList<Class>();
>>          classes.add (OurParameter.class);
>>          Mapper mapper = new AnnotationMapperImpl (classes);
>>          ObjectContentManager ocm = new ObjectContentManagerImpl
>>          (session, mapper);
>>          queryManager = ocm.getQueryManager();
>>          FilterImpl filter = (FilterImpl)queryManager.createFilter
>>          (OurParameter.class);
>>          filter.addContains ( srchField,
>>
>> org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(searchTerm).repl
>> aceAll
>>          ("'","''"));
>>          // (that last was replace all single ticks with two ticks, I
>>          honestly can't remember why though)
>>          Query query = queryManager.createQuery (filter);
>>          Collection<OurParameter>  resultsCollection =
>>          (Collection<OurParameter>)ocm.getObjects(query);
>>
>>          //convert to an array, do some other stuff, and return...
>>
>>      }
>>
>>
>>
>> On 08/26/2010 10:42 AM, Ard Schrijvers wrote:
>>> On Thu, Aug 26, 2010 at 3:53 PM, H. Wilson<wilsonh@randdss.com>   wrote:
>>>>    Ard,
>>>>
>>>> I have this same problem, however my scenario involves underscores rather
>>>> than hyphens. Although since Chris seems to be seeing the same exact
>>> It is because hyphens just as underscores are tokens the Standard
>>> Lucene Analyzer splits on. This combined with query expansion that
>>> happens for wildcard searches in lucene causes your issuess:
>>>
>>>> behavior as I was, I imagine we are both stuck on the same issue. After
>>>> scouring the forums for the solution, and not seeing your mentioned
>>>> solution, I actually posted my problem as detailed as possible here (
>>>> http://markmail.org/message/yh72wqd5b2hbr3j6 ) and received no response.
>>>> jcr:like was not an option for me, in this case, as our client wanted the
>>>> option for case-insensitive searches. Is there any chance you could please
>>>> narrow down where-about the post was which already covered this? Thanks for
>>> I can't seem to find my post again. But, I'll give you a quite simple
>>> solution:
>>>
>>> If you want to have the normal indexing of the property for normal
>>> searching, but also want to have the yyy* option, you need to
>>> duplicate the property also in another property. If your property,
>>> like
>>>
>>> .North.South.East.WestLand
>>>
>>> is only needed for the one you describe with wildcard searching, you
>>> only need it once. Now, suppose, your property is called myProp.
>>>
>>> To your configuration.xml add:
>>>
>>> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
>>>     <analyzers>
>>>           <analyzer
>>> class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer">
>>>               <property>myProp</property>
>>>           </analyzer>
>>>     </analyzers>
>>> </configuration>
>>>
>>> Your LowerCaseKeywordAnalyzer is very simple: it extends
>>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/KeywordAna
>>> lyzer.html
>>> and in the method
>>>
>>>    TokenStream tokenStream(String fieldName,Reader reader)
>>>
>>> after calling the super, you invoke Lucene's LowerCaseFilter.
>>>
>>> That is all (after you do a re-index of your repository). Since now a
>>> -, or _ or ~ or whatever is not seen as a token to split on, but you
>>> still use lowercase filter, you can do exactly what you want.
>>>
>>> Do the words need the be split on spaces however? No problem, just add
>>> a WhiteSpaceTokenizer from lucene. It is actually pretty simple,
>>>
>>> Hope this helps,
>>>
>>> Regards Ard
>>>
>>>> your time.
>>>>
>>>> *H. Wilson*
>>>>
>>>>
>>>> On 08/26/2010 04:59 AM, Ard Schrijvers wrote:
>>>>> Hello,
>>>>>
>>>>> You can search the archives (mail from me) for wildcard searching
>>>>> things related below. There was someone having similar issues. I
>>>>> explained the wildcard difficulties. Take a look at jcr:like for your
>>>>> usecases
>>>>>
>>>>> Regards Ard
>>>>>
>>>>> On Thu, Aug 26, 2010 at 10:19 AM, Dunstall, Christopher
>>>>> <cdunstall@csu.edu.au>     wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I'm having some trouble with an XPath query, where I'm searching
for
>>>>>> users with hyphens in their name.
>>>>>>
>>>>>> I'm using:
>>>>>> jcr:contains(*/*/*,'query')
>>>>>>
>>>>>> And it returns some odd results.
>>>>>>
>>>>>> I have two users, Sophie-Allen and Sophie-Anne. When I search for
>>>>>> 'sophie', I get back users back. Ok, fine, but if I search for 'sophie-a'
>>>>>> (with the hyphen escaped as 'sophie\-a' as per the JSR-170 Spec)
I get
>>>>>> zero
>>>>>> results returned.  Oddly, if I search for either 'sophie-allen' or
>>>>>> 'sophie-anne' I get the respective user details back fine. Shouldn't
I get
>>>>>> both users back when escaping the hyphen? Have I missed something
in the
>>>>>> spec?
>>>>>>
>>>>>> One other odd thing is the addition of an asterisk (*).  Searching
for
>>>>>> 'soph' and 'soph*' return the same result (both users), but if I
search
>>>>>> for
>>>>>> 'sophie-allen*', I get zero results, unlike when searching for just
>>>>>> 'sophie-allen'. Searching for 'sophie-a*' has the same result as
without
>>>>>> the
>>>>>> asterisk, i.e. nothing.
>>>>>>
>>>>>> The JSR-170 spec doesn't say anything (that I can find) but is the
>>>>>> asterisk a wildcard in the jcr:contains function or does it serve
some
>>>>>> other
>>>>>> purpose?
>>>>>>
>>>>>> Your assistance is greatly appreciated,
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Chris Dunstall | Service Support - Applications
>>>>>> Technology Integration/OLE Virtual Team
>>>>>> Division of Information Technology | Charles Sturt University | Bathurst,
>>>>>> NSW, Australia
>>>>>>
>>>>>> Ph: 02 63384818 | Fax: 02 63384181
>>>>>>
>

Mime
View raw message