jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: Problems with hyphen in JSR-170 XPath query using jcr:contains
Date Thu, 26 Aug 2010 16:57:46 GMT
Hello Wilson et al,

On Thu, Aug 26, 2010 at 6:22 PM, H. Wilson <wilsonh@randdss.com> wrote:
> Finally! I have been hacking away at this here and there for months, trying
> all different analyzers or not-using analyzers and modifying my queries all
> to no avail! Since I always like precise examples when I am searching

In that case, sry for my late help. I am not always in a position to
take time to help. Also, query expansion with wildcard searching is
imo not Lucene's best part. Anyway, for those interested, I could try
to dig up some mails I send internally in the past: It is something
that is hard to grasp without having some Lucene background though

> forums, I will post my (nearly) exact solution both for others and so that
> Ard might verify that this was indeed what he meant.

Yes, this is how I meant it, with the analyser part.

>
> Ard, I was hoping you could embellish a little on why we would duplicate the

I meant this that you would need this *only* if you also want the
original 'free text indexing' of the property. Thus, if you would like
to index some property both as the original jackrabbit indexing, but
you also want a KeyWord like one, you need the property twice...but,
normally, you don't need this.

> property? (I didn't actually do it to get this working perfectly) You lost
> me a little there, was it for efficiency? Thanks for everything!

You're welcome.

Thank you for reporting back that it works.

Regards Ard

>
> H. Wilson
>
> repository.xml (modified both SearchIndex tags to include an
> indexingConfiguration):
>
> <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>
> ....
> <param name="indexingConfiguration"
> value="${rep.home}/indexing_configuration.xml"/>
>
> </SearchIndex>
>
> indexing_configuration.xml:
>
> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
>     <analyzers>
>         <analyzer
> class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer">
>             <property>fullName</property>
>         </analyzer>
>     </analyzers>
> </configuration>
>
> LowerCaseKeywordAnalyzer.java:
>
> package org.mycompany.lucene.analysis;
>     import java.io.Reader;
>     import org.apache.lucene.analysis.KeywordAnalyzer;
>     import org.apache.lucene.analysis.LowerCaseFilter;
>     import org.apache.lucene.analysis.TokenStream;
>
> public class LowerCaseKeywordAnalyzer extends KeywordAnalyzer {
>
>     public TokenStream tokenStream ( String field, final Reader reader  ) {
>         TokenStream keywordTokenStream = super.tokenStream (field, reader);
>         return ( new LowerCaseFilter ( keywordTokenStream ) );
>     }
> }
>
> Our search class has a method which then does the following:
>
> public OurParameter[] getOurParameters (String searchTerm, String srchField
> ) { //srchField in this case was fullName
>
> TransientRepository repository = new TransientRepository ( OUR_REPO_CONFIG,
> OUR_REPO_LOCATION);
> Session session = repository.login ();
> List<Class> classes = new ArrayList<Class>();
> classes.add (OurParameter.class);
> Mapper mapper = new AnnotationMapperImpl (classes);
> ObjectContentManager ocm = new ObjectContentManagerImpl (session, mapper);
> queryManager = ocm.getQueryManager();
> FilterImpl filter = (FilterImpl)queryManager.createFilter
> (OurParameter.class);
> filter.addContains ( srchField,
> org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(searchTerm).replaceAll
> ("'","''"));
> // (that last was replace all single ticks with two ticks, I honestly can't
> remember why though)
> Query query = queryManager.createQuery (filter);
> Collection<OurParameter> resultsCollection =
> (Collection<OurParameter>)ocm.getObjects(query);
>
> //convert to an array, do some other stuff, and return...
>
> }
>
>
> On 08/26/2010 10:42 AM, Ard Schrijvers wrote:
>
> On Thu, Aug 26, 2010 at 3:53 PM, H. Wilson <wilsonh@randdss.com> wrote:
>
>  Ard,
>
> I have this same problem, however my scenario involves underscores rather
> than hyphens. Although since Chris seems to be seeing the same exact
>
> It is because hyphens just as underscores are tokens the Standard
> Lucene Analyzer splits on. This combined with query expansion that
> happens for wildcard searches in lucene causes your issuess:
>
> behavior as I was, I imagine we are both stuck on the same issue. After
> scouring the forums for the solution, and not seeing your mentioned
> solution, I actually posted my problem as detailed as possible here (
> http://markmail.org/message/yh72wqd5b2hbr3j6 ) and received no response.
> jcr:like was not an option for me, in this case, as our client wanted the
> option for case-insensitive searches. Is there any chance you could please
> narrow down where-about the post was which already covered this? Thanks for
>
> I can't seem to find my post again. But, I'll give you a quite simple
> solution:
>
> If you want to have the normal indexing of the property for normal
> searching, but also want to have the yyy* option, you need to
> duplicate the property also in another property. If your property,
> like
>
> .North.South.East.WestLand
>
> is only needed for the one you describe with wildcard searching, you
> only need it once. Now, suppose, your property is called myProp.
>
> To your configuration.xml add:
>
> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
>   <analyzers>
>         <analyzer
> class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer">
>             <property>myProp</property>
>         </analyzer>
>   </analyzers>
> </configuration>
>
> Your LowerCaseKeywordAnalyzer is very simple: it extends
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/KeywordAnalyzer.html
> and in the method
>
>  TokenStream tokenStream(String fieldName,Reader reader)
>
> after calling the super, you invoke Lucene's LowerCaseFilter.
>
> That is all (after you do a re-index of your repository). Since now a
> -, or _ or ~ or whatever is not seen as a token to split on, but you
> still use lowercase filter, you can do exactly what you want.
>
> Do the words need the be split on spaces however? No problem, just add
> a WhiteSpaceTokenizer from lucene. It is actually pretty simple,
>
> Hope this helps,
>
> Regards Ard
>
> your time.
>
> *H. Wilson*
>
>
> On 08/26/2010 04:59 AM, Ard Schrijvers wrote:
>
> Hello,
>
> You can search the archives (mail from me) for wildcard searching
> things related below. There was someone having similar issues. I
> explained the wildcard difficulties. Take a look at jcr:like for your
> usecases
>
> Regards Ard
>
> On Thu, Aug 26, 2010 at 10:19 AM, Dunstall, Christopher
> <cdunstall@csu.edu.au>  wrote:
>
> Hi all,
>
> I'm having some trouble with an XPath query, where I'm searching for
> users with hyphens in their name.
>
> I'm using:
> jcr:contains(*/*/*,'query')
>
> And it returns some odd results.
>
> I have two users, Sophie-Allen and Sophie-Anne. When I search for
> 'sophie', I get back users back. Ok, fine, but if I search for 'sophie-a'
> (with the hyphen escaped as 'sophie\-a' as per the JSR-170 Spec) I get zero
> results returned.  Oddly, if I search for either 'sophie-allen' or
> 'sophie-anne' I get the respective user details back fine. Shouldn't I get
> both users back when escaping the hyphen? Have I missed something in the
> spec?
>
> One other odd thing is the addition of an asterisk (*).  Searching for
> 'soph' and 'soph*' return the same result (both users), but if I search for
> 'sophie-allen*', I get zero results, unlike when searching for just
> 'sophie-allen'. Searching for 'sophie-a*' has the same result as without the
> asterisk, i.e. nothing.
>
> The JSR-170 spec doesn't say anything (that I can find) but is the
> asterisk a wildcard in the jcr:contains function or does it serve some other
> purpose?
>
> Your assistance is greatly appreciated,
>
> Regards,
>
> Chris Dunstall | Service Support - Applications
> Technology Integration/OLE Virtual Team
> Division of Information Technology | Charles Sturt University | Bathurst,
> NSW, Australia
>
> Ph: 02 63384818 | Fax: 02 63384181
>
>

Mime
View raw message