lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Tignor <ctig...@thinkmap.com>
Subject Re: Phrase query with terms at same location
Date Thu, 19 Nov 2009 14:38:44 GMT
Thanks, Erick -

Indeed every word will have a part of speech token but Is this how the slop
actually works?  My understanding was that if I have two tokens in the same
location then each will not effect searches involving other in terms of the
slop as slop indicates the number of words *between* search terms in a
phrase.

Are tokens at the same location actually adjacent in their ordinal values,
thus affecting the slop as you describe?

If so, Is there a predictable way to determine which comes before the other
- perhaps the order they are inserted when being tokenized?

thanks,

C>T>

On Thu, Nov 19, 2009 at 8:35 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> If I'm reading this right, your tokenizer creates two tokens. One
> "report" and one "_n"... I suspect if so that this will create some
> "interesting"
> behaviors. For instance, if you put two tokens in place, are you going
> to double the slop when you don't care about part of speech? Is every
> word going to get a marker? etc.
>
> I'm not sure payloads would be useful here, but you might check it out...
>
> What I'd think about, though, is a variant of synonyms. That is, index
> report and report_n (note no space) at the same location. Then, when
> you wanted to create a part-of-speech-aware query, you'd attach the
> various markers to your terms (_n, _v, _adj, _adv etc.) and not have to
> worry about unexpected side-effects.
>
> HTH
> Erick
>
> On Wed, Nov 18, 2009 at 5:20 PM, Christopher Tignor <ctignor@thinkmap.com
> >wrote:
>
> > Hello,
> >
> > I have indexed words in my documents with part of speech tags at the same
> > location as these words using a custom Tokenizer as described, very
> > helpfully, here:
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200607.mbox/%3C20060712115026.38897.qmail@web26002.mail.ukl.yahoo.com%3E
> >
> > I would like to do a search that retrieves documents when a given word is
> > used with a specific part of speech, e.g. all docs where "report" is used
> > as
> > a noun.
> >
> > I was hoping I could use something like a PhraseQuery with "report _n"
> (_n
> > is my noun part of speech tag) with some sort of identifier that
> describes
> > the words as having to be at the same location - like a null slop or
> > something.
> >
> > Any thoughts on how to do this?
> >
> > thanks so much,
> >
> > C>T>
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
> >
>



-- 
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message