lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fredrik Lindner" <Fredrik.Lind...@r5.com>
Subject RE: hierarchical search
Date Mon, 17 May 2004 12:44:42 GMT
Hi Matt, thanks for your reply!

Indeed your proposed solution would work for the simple case I
described, however I failed to mention that I must be able to combine
the described queries to a complex one and there for can't make any
assumptions based on size attribute. I'm sorry If I wasted your time.

On the bright side though I found that creating a specialized query
wasn't that difficult at all. A quick scan through the WildCardQuery
class and the related WildCardQueryEnum gave me some valuable hints
regarding this. 

If there is any interest I would happily contribute the code back to the
community.

Regards
/Fredrik



-----Original Message-----
From: Matt Quail [mailto:matt@ctx.com.au] 
Sent: den 17 maj 2004 12:19
To: Lucene Users List
Subject: Re: hierarchical search

Fredrik,

I would tackle your problem like this:

Say that that field you want to index is "path". I would turn this into
*three* indexed fields:
1) multiple path prefixes ("pre-paths")
2) multiple path suffixes ("post-paths")
3) the number of "components" in the path ("path-size").

For example, for a "path" of "/foo/bar/dog/cat/fish" I would index it
like this:

doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/cat/fish/"));
doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/cat/"));
doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/"));
doc.add(Field.Keyword("pre-paths", "/foo/bar/"));
doc.add(Field.Keyword("pre-paths", "/foo/"));
doc.add(Field.Keyword("pre-paths", "/"));
doc.add(Field.Keyword("post-paths", "/foo/bar/dog/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/bar/dog/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/dog/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/fish/"));
doc.add(Field.Keyword("post-paths", "/"));
doc.add(Field.Keyword("path-size", "5"));

And to do your "type 2" search for (prefix="/p1/p2/p3/" and
suffix="/s1/s2/s3/") I would use a query like this:

Query q = QueryParser.parse("pre-paths:'/p1/p2/p3/' AND
post-paths:'/s1/s2/s3/ AND (path-size:7)'");

The trick is to lock down the prefix and suffix, then define the amount
of "slack" between the prefix and the suffix using the path-size. If you
wanted the "slack" between either end to be zero or one segments, then
change the size clause to something like (path-size:6 OR path-size:7)


I think that should work.

=Matt


Fredrik Lindner wrote:

> Hi all!
> 
> I'm currently developing an application in which text searching is a
> main component. Among other things, a document will contain a field
> denoting hierarchical information. The information is stored as a
string
> using the common path syntax, /x/y/z/etc/...
> 
> I would like to be able to search documents based on the path field
> using two different selection criteria's,
> 
> 1. given a prefix path and a suffix path select all documents for
which
> the path start with the supplied prefix, ends with the suffix and has
> "some path" in between.
> 
> 2. like (1) but with the requirement that "some path" spans one and
one
> level only. i.e. it defines a strict grandparent/grandchild
relationship
> between the last path entry of the prefix and the first of the suffix.
> 
> For example, with prefix /p1/p2/p3/ and suffix /s1/s2/s3/ and three
> documents with the path filed values
> 
> a) /p1/p2/p3/x/s1/s2/s3/
> b) /p1/p2/p3/y/s1/s2/s3/
> c) /p1/p2/p3/x/y/s1/s2/s3/
> 
> case one should select them all whereas case two should select only a)
> and b).
> 
> My problem is that I am uncertain on how to implement the second case.
I
> guess I have to extend the Lucene internals somehow but I am quite too
> inexperienced regarding Lucene to do so directly. Any pointers, hints
or
> comments are most welcome.
> 
> Regards
> /Fredrik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message