lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Help with delimited text
Date Wed, 06 Apr 2011 12:20:30 GMT
A TermQuery is really dumb. It doesn't do anything at all to the
input, it assumes you've done all that up front. Try parsing
a query rather than using TermQuery....

And I suspect you'll have problems with casing, but that's another
story....

Best
Erick

On Wed, Apr 6, 2011 at 6:33 AM, Mark Wiltshire <
mark@redalertconsultants.co.uk> wrote:

> Thanks Ian,
>
> I have managed to do that and through Luke I get My expected results.
>
> Here is now my Index Code.
>
>                 StringTokenizer st = buildSubjectArea(dbConnection, oid);
>                 int tokenCount = 0;
>                 while (st.hasMoreTokens()){
>                 tokenCount++;
>                 String categoryPath = st.nextToken();
>
>                     if (categoryPath.length() != 0) {
>                         ////doc.add(Field.Text("category", category));
>                     doc.add(new Field("category_path",categoryPath,Field.
> Store.YES,Field.Index.NOT_ANALYZED));
>
>
>                     }
>                 }
>
> Now using Luke with KeyworkAnalyser if I enter
>
> category_path:/Top/My Prods*
>
> I get my expected results back.
>
>
> But I cannot get this working in my search code.
> I am using this field to filter the results, i.e.
>
> If I want Top Books I want to filter by /Top/Books*, if I want Top CD's I
> want to filter by /Top/CD*
>
> Filter string is generated from mapping file which gives me a category path
>
> e.g. /Top/Books
>
> Searcher searcher = new IndexSearcher(FSDirectory.open(new File(index)));
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
> ...
> Query  subjectFilterQuery = new TermQuery(new Term("category_path"
> ,"/Top/Books*"));
> QueryWrapperFilter filter = new QueryWrapperFilter(subjectFilterQuery);
> TopDocs searchResult = searcher.search(query,filter,
> MAX_SEARCH_RESULTS_SIZE);
>
> If I debug the subjectFilterQuery and write out
>
> subjectFilterQuery.toString()
>
> I See
>
> "subjectFilterQuery.toString()" category_path:/TOP/CD*
>  So it looks like the query is constructed correctly ?
> But this does not bring back any results ?
> You say I need to be consistent in Index and Query, am I missing something.
>
> Many thanks
>
> Mark
>
> On 6 Apr 2011, at 10:06, Ian Lea wrote:
>
> You can add multiple values for a field to a single document.
>
> Document doc = new Document();
> String[] paths = whatever.split(",");
> for (String p : paths) {
>  doc.add(new Field("path", p, whatever ...);
> }
>
>
> For searching, assuming you only want to be able to wildcard on path
> delimiters, you could index
>
> /Top/My Prods/Book Prods/Text Books
> /Top/My Prods/Book Prods
> /Top/My Prods
> /Top
>
> which would let you search on any of them.
>
> You'll want to pick or build an analyzer that behaves as you want wrt
> case matching and not splitting on the /.  Sometime it can be easier
> to replace a character e.g. / to _.  I think there is a lucene class
> that can do that, maybe MappingCharFilter, if you don't want to do it
> yourself.  You will of course need to be consistent and do the same
> processing at index and search time.
>
>
> --
> Ian.
>
>
>
> On Wed, Apr 6, 2011 at 7:55 AM, Mark Wiltshire
> <mark@redalertconsultants.co.uk> wrote:
>
> To add more information
>
>
>        I am then wanting to search this field using part or all of the path
> using wildcards
>
>
>        i.e.
>
>
>        Search category_path with /Top/My Prods*
>
>
>
> Hi java-users
>
>
>        I need some help.
>
>
>        I am indexing categories into a single field category_path
>
>
>        Which may contain items such as
>
>
>        /Top/Books,/Top/My Prods/Book Prods/Text Books,
> /Maths/Books/TextBooks
>
>
>        i.e. category paths delimited by ,
>
>
>        I want to store this field, so the Analyser tokenizes the document
> only on ',' charaters and not on the '/' characters
>
>
>        I am adding the field to the index using
>
>
>        Where the categoryPath is a String containing list of the items
> above.
>
>
>        doc.add(new
> Field("category_path",categoryPath,Field.Store.YES,Field.Index.ANALYZED));
>
>
>        I think I need to split the string my self, but how do I pass this
> to Lucene, do I have to setup different fields ?
>
>
>        I need to keep the full path in the index, as I want to use this
> when redirecting users, when clicking on the results.
>
>
>        Any help would be great.
>
>
>        Many thanks
>
>
> Regards
>
>
> Mark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> Regards
>
> Mark
>
> You can view my current Wolters Kluwer tasks at:
> *https://secure.nozbe.com/nozbe/page/feed/get-markRAC.5625.wolters_kluwer*
>
> You can view my calendar at:    You can subscribe to my calendar at:
> http://ical.me.com/markjwiltshire/Work
> webcal://ical.me.com/markjwiltshire/Work.ics
>
>
> *
>
>
>
> Mark Wiltshire** ** **Red Alert Consultants LTD*
> Director - Technical Web Consultant Flat 6
>  168 Tower Bridge Road
>  London
>  SE1 3LS
> mark@redalertconsultants.co.uk
> iPhone email: markjwiltshire@me.com
> IM: markjwiltshire@yahoo.com Desk: 0208 247 1547
> http://www.redalertconsultants.co.uk Mobile: 07973 252 403
>
>
>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message