lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Jockman" <brand...@isogen.com>
Subject Re: Search on XML files
Date Mon, 13 May 2002 15:49:37 GMT
one minor correction... (!)

> > 2. What the query string suppose to be if I want to get records which
> > contain (Australia and 20020415) and (not (HongKong and 20020315))?
>
> ((Australia +tagname:country) AND (+tagname:date +20020415))  AND
> !(( tagname:country HongKong) AND (tagname:date 20020415))

-B

----- Original Message -----
From: "Brandon Jockman" <brandonj@isogen.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, May 13, 2002 10:31 AM
Subject: Re: Search on XML files


> Fanny,
>
> The current implementation allows for searching on:
>
> a.. the entire PCDATA content of an XML document.
> b.. the PCDATA content within specific elements.
> c.. processing instructions by name and content.
> d.. attributes of elements by both name and value.
> e.. elements/PIs with specific parent element types.
> f.. elements/PIs at specific child locations within a parent element.
> g.. elements/PIs with specific ancestor element types.
> h.. elements/PIs with specifically ordered ancestor element type.
>
> The original need we had for XML contextual searching was to find a
specific
> document that contained a particular element with particular content, and
in
> relationships to other element types.
>
> Currently, searching for a document based on content of two separate
> elements with a logical AND relationship is not provided. However, the OR
> relationship should work just fine.
>
> There is a field stored that contains all text content for the document,
but
> that probably isn't enough for what you need.
>
> Each lucene document from the same XML document has a 'docid' field.
>
>
> You have two real options:
>
> 1. Write a queryparser that inherits from the Lucene one that detects the
> relationship and performs more than one search, grouping results based on
> document id.
>
> Searching for X and Y would become:
> 1. Search for X -> Hits_X
> 2. Search for Y -> Hits_Y
> 3. Merge Hits_X and Hits_Y based on docid.
>
> -=-
>
> 2. Write a queryparser that inherits from the lucene one, detects that you
> are searching for a document based on several elements, as opposed to a
> single one, and converts the search from:
>
> X AND Y
>
> to:
>
> (X AND docid:docidentifier) OR (Y AND docid:docidentifier)
>
> ..and then merge results based on docid.
>
>
> You may also be able to leverage the search 'Filtering' mechanism, but I'm
> not experienced with that...
>
> <<<From FAQ>>>
> 16. What is filtering and how is it performed ?
> Filtering means imposing additional restriction on the hit list to
eliminate
> hits that otherwise would be included in the search results. There are two
> ways to filter hits:
>
>   a.. Search Query - in this approach, provide your custom filter object
to
> the when you call the search() method. This filter will be called exactly
> once to evaluate every document that resulted in non zero score.
>   b.. Selective Collection - in this approach you perform the regular
search
> and when you get back the hit list, collect only those that matches your
> filtering criteria. In this approach, your filter is called only for hits
> that returned by the search method which may be only a subset of the non
> zero matches (useful when evaluating your search filter is expensive).
> <<< ... >>>
>
> > 1. What the query string suppose to be if I want to get records which
> > contain (Austalia and 20020415) or (HongKong and 20020315)?
>
> ((Australia +tagname:country) AND (+tagname:date +20020415)) OR ((HongKong
> +tagname:country) AND (tagname:date +20020415))
>
> > 2. What the query string suppose to be if I want to get records which
> > contain (Australia and 20020415) and (not (HongKong and 20020315))?
>
> ((Australia +tagname:country) AND (+tagname:date +20020415))  AND
> (( tagname:country HongKong) AND (tagname:date 20020415))
>
> Either of these queries will require the additional functionality outlined
> in options 1 or 2 above.
>
>
> Regards,
>
> -Brandon
>
> Brandon Jockman
> ISOGEN International, LLC.
> brandonj@isogen.com
>
>
>
> ----- Original Message -----
> From: "Fanny Yeung" <toffeem@hotmail.com>
> To: <lucene-user@jakarta.apache.org>
> Sent: Monday, May 13, 2002 7:48 AM
> Subject: Search on XML files
>
>
> > Hi,
> >
> > Does anyone know how to make up the query for multiple fields search on
> XML
> > files in the sample provided by isogen? Does it support?
> >
> > I would like to get all the results which contain the value of
'Australia'
> > in tag 'country' AND the date is '20020415' in the tag 'date'. I always
> get
> > 0 hit count. Any problem of my query string?
> >
> > +(Australia AND tagname:country) AND +(20020415 AND tagname:date)
> >
> > 1. What the query string suppose to be if I want to get records which
> > contain (Austalia and 20020415) or (HongKong and 20020315)?
> > 2. What the query string suppose to be if I want to get records which
> > contain (Australia and 20020415) and (not (HongKong and 20020315))?
> >
> > Since I am a newbie on Lucene, I am wonder whether I can use filter to
> > restricts the search results? In my case, I need to retrieve all the
news
> > between a date range (for example, 20020102 to 20020330). In addition,
the
> > result should only contains those news that have been subscribed  .
Should
> I
> > use filter to filter out the unsubscribed news? Or I should make up a
> query
> > string to include those subscribed news? Which approach is better in
terms
> > of performance?
> >
> > Thanks in advance.
> >
> >
> > Fanny
> >
> > _________________________________________________________________
> > MSN Photos is the easiest way to share and print your photos:
> > http://photos.msn.com/support/worldwide.aspx
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message