lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Search for a term in all fields
Date Wed, 21 Feb 2007 15:59:15 GMT
Nothing jumps out at me....

Erick

On 2/21/07, Kainth, Sachin <Sachin.Kainth@atkinsglobal.com> wrote:
>
> Sorry I didn't make myself clear at all.  Remember you said that it is
> possible to do this:
>
> > Sure. Convert your simple queries into span queries (which are also
> > relatively simple). Then, when you index everything in the "all"
> > field, subclass your analyzer to return a large PositionIncrementGap.
> > Explaining how this works with words is awkward, so....
> >
> > doc.add("all", "one two three");
> > doc.add("all", "four five six");
> > doc.add("all", "seven eight nine");
> > index the document.
> >
> > Assume you've implemented an analyzer that returns 1000 for
> > getPositionIncrementGap.
> >
> > Now, the term offsets in the single document will be one - 0 two - 1
> > three - 2 four 1003 five 1004 six 1005 seven 2006 eight 2007 nine 2008
> >
> > Now, if you use SpanNearQuery with a slop of 900 (i.e. "one nine"~900)
>
> > you won't get a match because the "distance" between one and nine is
> > more than 900. But "one three"~900 will match.
> >
> > It's possible to transform any query into a set of span queries, See
> > the thread "Multiword Highlighting" that Mark Miller and I were
> > exchanging ideas on recently. Be aware that the code we were talking
> > about has to have a modification when used on a "regular" index where
> > it pays attention to the document that each sub-clause comes. The
> > code, as written, assumes you're using a MemoryIndex for one and only
> > one document, so unless you need complex queries, I'd just think about
>
> > rewriting simple queries with ANDs as a SpanNearQuery.
>
> Well, what I meant was instead of using a gap of 1000 what I was
> thinking is could we not replace that gap of a 1000 characters with a ~.
> Then, if this is possible what I was wondering is whether there is a way
> of performing searches using the ~.
>
> Cheers
>
> Sachin
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 21 February 2007 13:05
> To: java-user@lucene.apache.org
> Subject: Re: Search for a term in all fields
>
> I don't see what you're getting at. There are only two forms of a query
> term,,,, field:value value
>
> And the second is really the first with the default field you specified
> in the parser implied. So just think of all terms you specify in a query
> as field:term.
>
> Having some "special character" in the index doesn't help you because
> you still have to specify the field. And your two choices are still
> either a BooleanQuery that mentions all fields or indexing the data into
> a single field.
>
> Best
> Erick
>
>
>
> On 2/21/07, Kainth, Sachin <Sachin.Kainth@atkinsglobal.com> wrote:
> >
> > Well, here's my current thoughts on acheiveing this.  Instead of
> > putting a 1000 space gap between elements of the 1ll field could I not
>
> > use a character that isn't used in the data such as ~ and then somehow
>
> > (don't know how) use that to search all fields?
> >
> > -----Original Message-----
> > From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> > Sent: 20 February 2007 18:30
> > To: java-user@lucene.apache.org
> > Subject: Re: Search for a term in all fields
> >
> >
> > The information Erick gave you when you asked this question yesterday
> > is all very accurate -- the one addition i would make is that you
> > don't need SpanNear queries to take advantage of positionINcrimentGap
> > -- PhraseQueries do that to.
> >
> > Consolidating your fields into a single "all" field, or constructing a
>
> > BoolenQuery across all of your existing fields are really the two main
>
> > options -- each with their tradeoffs.
> >
> > http://www.nabble.com/Search-in-all-fields-tf3254569.html
> >
> > : Date: Tue, 20 Feb 2007 12:29:25 -0000
> > : From: "Kainth, Sachin" <Sachin.Kainth@atkinsglobal.com>
> > : Reply-To: java-user@lucene.apache.org
> > : To: java-user@lucene.apache.org
> > : Subject: Search for a term in all fields
> > :
> > : Hi all,
> > :
> > : How do I search for a term in all fields of a document?
> > :
> > : Cheers
> > :
> > : Sachin
> > :
> > :
> > : This email and any attached files are confidential and copyright
> > protected. If you are not the addressee, any dissemination of this
> > communication is strictly prohibited. Unless otherwise expressly
> > agreed in writing, nothing stated in this communication shall be
> > legally binding.
> > :
> > : The ultimate parent company of the Atkins Group is WS Atkins plc.
> > Registered in England No. 1885586.  Registered Office Woodcote Grove,
> > Ashley Road, Epsom, Surrey KT18 5BW.
> > :
> > : Consider the environment. Please don't print this e-mail unless you
> > really need to.
> > :
> >
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> > This message has been scanned for viruses by MailControl - (see
> > http://bluepages.wsatkins.co.uk/?6875772)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message