lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Wiltshire <m...@redalertconsultants.co.uk>
Subject Re: Help with delimited text
Date Thu, 07 Apr 2011 11:21:31 GMT
Thanks Ian, your a star :-)

R

Mark

On 7 Apr 2011, at 11:18, Ian Lea wrote:

Mark - I've uploaded some code to http://pastebin.com/mqSVcWUi that
indexes and searches file system paths.  It demonstrates what I've
been trying to suggest and may help you get your search up and
running.

--
Ian.

On Thu, Apr 7, 2011 at 8:18 AM, Mark Wiltshire
<mark@redalertconsultants.co.uk> wrote:
> Hi Thanks Ian for you help on this, its driving me nuts :-)
> The StandardAnalyser is only used on the search query term being passed
> also.
> But In this case I am just adding a filter to the search.
> The actual category may be
> /Top/Books/Accountancy/10_Compliance/International
> And the filter will allow me so filter search by subject area, i.e.
> There is a drop down on the search page showing
> Books
> Online
> Software
> I translate this to a base category to then filter on.  i.e. Books =
> /Top/Books
> It performs filter on search, so only those living in /Top/Books/*
> categories (with *)
> are returned.
> I then want to use the actual category in the index to correctly display the
> item.
> Hope that makes sense.
> Many thanks
> Mark
> On 6 Apr 2011, at 12:05, Ian Lea wrote:
> 
> Query  subjectFilterQuery = new TermQuery(new
> Term("category_path","/Top/Books*"));
> 
> Try losing the asterisk.  Presumably the indexed term is "/Top/Books".
> 
> You don't appear to be using the StandardAnalyzer you create in your
> code sample but if you did searches wouldn't work since "/Top/Books"
> would certainly be lower cased.  You've got /TOP/CD in there as well -
> don't know where that came from.  Unless you want case-sensitive
> searching I'd downcase these paths in advance, and replace / with some
> character that will be ignored by any analyzers you might be using.
> 
> 
> --
> Ian.
> 
> 
> On Wed, Apr 6, 2011 at 11:33 AM, Mark Wiltshire
> <mark@redalertconsultants.co.uk> wrote:
> 
> Thanks Ian,
> 
> I have managed to do that and through Luke I get My expected results.
> 
> Here is now my Index Code.
> 
>                 StringTokenizer st = buildSubjectArea(dbConnection, oid);
> 
>                 int tokenCount = 0;
> 
>                 while (st.hasMoreTokens()){
> 
>                 tokenCount++;
> 
>                 String categoryPath = st.nextToken();
> 
>                     if (categoryPath.length() != 0) {
> 
>                         ////doc.add(Field.Text("category", category));
> 
>                     doc.add(new
> Field("category_path",categoryPath,Field.Store.YES,Field.Index.NOT_ANALYZED));
> 
> 
> 
>                     }
> 
>                 }
> 
> Now using Luke with KeyworkAnalyser if I enter
> 
> category_path:/Top/My Prods*
> 
> I get my expected results back.
> 
> But I cannot get this working in my search code.
> 
> I am using this field to filter the results, i.e.
> 
> If I want Top Books I want to filter by /Top/Books*, if I want Top CD's I
> want to filter by /Top/CD*
> 
> Filter string is generated from mapping file which gives me a category path
> 
> e.g. /Top/Books
> 
> Searcher searcher = new IndexSearcher(FSDirectory.open(new File(index)));
> 
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
> 
> ...
> 
> Query  subjectFilterQuery = new TermQuery(new
> Term("category_path","/Top/Books*"));
> 
> QueryWrapperFilter filter = new QueryWrapperFilter(subjectFilterQuery);
> 
> TopDocs searchResult =
> searcher.search(query,filter,MAX_SEARCH_RESULTS_SIZE);
> 
> If I debug the subjectFilterQuery and write out
> 
> subjectFilterQuery.toString()
> 
> I See
> 
> "subjectFilterQuery.toString()" category_path:/TOP/CD*
> 
> So it looks like the query is constructed correctly ?
> 
> But this does not bring back any results ?
> 
> You say I need to be consistent in Index and Query, am I missing something.
> 
> Many thanks
> 
> Mark
> 
> On 6 Apr 2011, at 10:06, Ian Lea wrote:
> 
> You can add multiple values for a field to a single document.
> 
> Document doc = new Document();
> 
> String[] paths = whatever.split(",");
> 
> for (String p : paths) {
> 
>  doc.add(new Field("path", p, whatever ...);
> 
> }
> 
> 
> For searching, assuming you only want to be able to wildcard on path
> 
> delimiters, you could index
> 
> /Top/My Prods/Book Prods/Text Books
> 
> /Top/My Prods/Book Prods
> 
> /Top/My Prods
> 
> /Top
> 
> which would let you search on any of them.
> 
> You'll want to pick or build an analyzer that behaves as you want wrt
> 
> case matching and not splitting on the /.  Sometime it can be easier
> 
> to replace a character e.g. / to _.  I think there is a lucene class
> 
> that can do that, maybe MappingCharFilter, if you don't want to do it
> 
> yourself.  You will of course need to be consistent and do the same
> 
> processing at index and search time.
> 
> 
> --
> 
> Ian.
> 
> 
> 
> On Wed, Apr 6, 2011 at 7:55 AM, Mark Wiltshire
> 
> <mark@redalertconsultants.co.uk> wrote:
> 
> To add more information
> 
>        I am then wanting to search this field using part or all of the path
> using wildcards
> 
>        i.e.
> 
>        Search category_path with /Top/My Prods*
> 
> 
> Hi java-users
> 
>        I need some help.
> 
>        I am indexing categories into a single field category_path
> 
>        Which may contain items such as
> 
>        /Top/Books,/Top/My Prods/Book Prods/Text Books,
> /Maths/Books/TextBooks
> 
>        i.e. category paths delimited by ,
> 
>        I want to store this field, so the Analyser tokenizes the document
> only on ',' charaters and not on the '/' characters
> 
>        I am adding the field to the index using
> 
>        Where the categoryPath is a String containing list of the items
> above.
> 
>        doc.add(new
> Field("category_path",categoryPath,Field.Store.YES,Field.Index.ANALYZED));
> 
>        I think I need to split the string my self, but how do I pass this to
> Lucene, do I have to setup different fields ?
> 
>        I need to keep the full path in the index, as I want to use this when
> redirecting users, when clicking on the results.
> 
>        Any help would be great.
> 
>        Many thanks
> 
> Regards
> 
> Mark
> 
> ---------------------------------------------------------------------
> 
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> 
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



Regards

Mark

You can view my current Wolters Kluwer tasks at:
https://secure.nozbe.com/nozbe/page/feed/get-markRAC.5625.wolters_kluwer

You can view my calendar at:		   You can subscribe to my calendar at:
http://ical.me.com/markjwiltshire/Work    webcal://ical.me.com/markjwiltshire/Work.ics







Mark Wiltshire							Red Alert Consultants LTD
Director - Technical Web Consultant						Flat 6
									  168 Tower Bridge Road
												    London
											             SE1 3LS
mark@redalertconsultants.co.uk
iPhone email: markjwiltshire@me.com
IM: markjwiltshire@yahoo.com			    Desk:     0208 247 1547
http://www.redalertconsultants.co.uk	    Mobile: 07973 252 403





Mime
View raw message