Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of markrmiller@gmail.com
 designates 209.85.220.176 as permitted sender)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Apple Message framework v1244.3)
Subject: Re: Search within a sentence (revisited)
From: Mark Miller <markrmiller@gmail.com>
In-Reply-To: <8329E98E-70D2-4314-A135-2FD5A699B91B@gmail.com>
Date: Thu, 21 Jul 2011 17:23:10 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <7711A405-BCB2-44D6-AE0E-C3F87C61B24C@gmail.com>
References: 
 <CAN8y9rR43XurPBCXKcofDhTzx+G0=6-dG2jna9y4jEz_onzbMw@mail.gmail.com>
 <7DD18AE8-B81B-4EFD-BD43-E6D866AF002D@gmail.com>
 <CBBA9901-60D9-492C-B932-91CDF67446D0@gmail.com>
 <CAN8y9rTs3=w9qjLVTEfiK5wptj=NjXKuU9X=vvXe-WXh+TD-GQ@mail.gmail.com>
 <99EC18FA-B784-433D-A024-014694A6FD5E@gmail.com>
 <CAN8y9rS5EmL-_kH8qXSd+rvr1uqNTsmffhSws690mVUNm6GNPw@mail.gmail.com>
 <8329E98E-70D2-4314-A135-2FD5A699B91B@gmail.com>
To: java-user@lucene.apache.org


I just uploaded a patch for 3X that will work for 3.2.

On Jul 21, 2011, at 4:25 PM, Mark Miller wrote:

> Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to =
change that to an IndexReader I believe.
>=20
> - Mark
>=20
> On Jul 21, 2011, at 4:01 PM, Peter Keegan wrote:
>=20
>> Does this patch require the trunk version? I'm using 3.2 and
>> 'AtomicReaderContext' isn't there.
>>=20
>> Peter
>>=20
>> On Thu, Jul 21, 2011 at 3:07 PM, Mark Miller <markrmiller@gmail.com> =
wrote:
>>=20
>>> Hey Peter,
>>>=20
>>> Getting sucked back into Spans...
>>>=20
>>> That test should pass now - I uploaded a new patch to
>>> https://issues.apache.org/jira/browse/LUCENE-777
>>>=20
>>> Further tests may be needed though.
>>>=20
>>> - Mark
>>>=20
>>>=20
>>> On Jul 21, 2011, at 9:28 AM, Peter Keegan wrote:
>>>=20
>>>> Hi Mark,
>>>>=20
>>>> Here is a unit test using a version of 'SpanWithinQuery' modified =
for 3.2
>>>> ('getTerms' removed) . The last test fails (search for "1" and =
"3").
>>>>=20
>>>> package org.apache.lucene.search.spans;
>>>>=20
>>>> import java.io.Reader;
>>>>=20
>>>> import org.apache.lucene.analysis.Analyzer;
>>>> import org.apache.lucene.analysis.TokenStream;
>>>> import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
>>>> import
>>>> =
org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
>>>> import =
org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
>>>> import org.apache.lucene.document.Document;
>>>> import org.apache.lucene.document.Field;
>>>> import org.apache.lucene.index.IndexReader;
>>>> import org.apache.lucene.index.RandomIndexWriter;
>>>> import org.apache.lucene.index.Term;
>>>> import org.apache.lucene.store.Directory;
>>>> import org.apache.lucene.search.IndexSearcher;
>>>> import org.apache.lucene.search.PhraseQuery;
>>>> import org.apache.lucene.search.ScoreDoc;
>>>> import org.apache.lucene.search.TermQuery;
>>>> import org.apache.lucene.search.spans.SpanNearQuery;
>>>> import org.apache.lucene.search.spans.SpanQuery;
>>>> import org.apache.lucene.search.spans.SpanTermQuery;
>>>> import org.apache.lucene.util.LuceneTestCase;
>>>>=20
>>>> public class TestSentence extends LuceneTestCase {
>>>> public static final String field =3D "field";
>>>> public static final String START =3D "^";
>>>> public static final String END =3D "$";
>>>> public void testSetPosition() throws Exception {
>>>> Analyzer analyzer =3D new Analyzer() {
>>>> @Override
>>>> public TokenStream tokenStream(String fieldName, Reader reader) {
>>>> return new TokenStream() {
>>>> private final String[] TOKENS =3D {"1", "2", "3", END, "4", "5", =
"6", END,
>>>> "9"};
>>>> private final int[] INCREMENTS =3D {1,1,1,0,1,1,1,0,1};
>>>> private int i =3D 0;
>>>>=20
>>>> PositionIncrementAttribute posIncrAtt =3D
>>>> addAttribute(PositionIncrementAttribute.class);
>>>> CharTermAttribute termAtt =3D =
addAttribute(CharTermAttribute.class);
>>>> OffsetAttribute offsetAtt =3D addAttribute(OffsetAttribute.class);
>>>>=20
>>>> @Override
>>>> public boolean incrementToken() {
>>>> assertEquals(TOKENS.length, INCREMENTS.length);
>>>> if (i =3D=3D TOKENS.length)
>>>> return false;
>>>> clearAttributes();
>>>> termAtt.append(TOKENS[i]);
>>>> offsetAtt.setOffset(i,i);
>>>> posIncrAtt.setPositionIncrement(INCREMENTS[i]);
>>>> i++;
>>>> return true;
>>>> }
>>>> };
>>>> }
>>>> };
>>>> Directory store =3D newDirectory();
>>>> RandomIndexWriter writer =3D new RandomIndexWriter(random, store,
>>> analyzer);
>>>> Document d =3D new Document();
>>>> d.add(newField("field", "bogus", Field.Store.YES, =
Field.Index.ANALYZED));
>>>> writer.addDocument(d);
>>>> IndexReader reader =3D writer.getReader();
>>>> writer.close();
>>>> IndexSearcher searcher =3D newSearcher(reader);
>>>>=20
>>>> SpanTermQuery startSentence =3D makeSpanTermQuery(START);
>>>> SpanTermQuery endSentence =3D makeSpanTermQuery(END);
>>>> SpanQuery[] clauses =3D new SpanQuery[2];
>>>> clauses[0] =3D makeSpanTermQuery("1");
>>>> clauses[1] =3D makeSpanTermQuery("2");
>>>> SpanNearQuery allKeywords =3D new SpanNearQuery(clauses, =
Integer.MAX_VALUE,
>>>> false); // SpanAndQuery equivalent
>>>> SpanWithinQuery query =3D new SpanWithinQuery(allKeywords, =
endSentence, 0);
>>>> System.out.println("query: "+query);
>>>> ScoreDoc[] hits =3D searcher.search(query, null, 1000).scoreDocs;
>>>> assertEquals(hits.length, 1);
>>>>=20
>>>> clauses[1] =3D makeSpanTermQuery("4");
>>>> allKeywords =3D new SpanNearQuery(clauses, Integer.MAX_VALUE, =
false); //
>>>> SpanAndQuery equivalent
>>>> query =3D new SpanWithinQuery(allKeywords, endSentence, 0);
>>>> System.out.println("query: "+query);
>>>> hits =3D searcher.search(query, null, 1000).scoreDocs;
>>>> assertEquals(hits.length, 0);
>>>>=20
>>>> PhraseQuery pq =3D new PhraseQuery();
>>>> pq.add(new Term(field, "3"));
>>>> pq.add(new Term(field, "4"));
>>>> hits =3D searcher.search(pq, null, 1000).scoreDocs;
>>>> assertEquals(hits.length, 1);
>>>>=20
>>>> clauses[1] =3D makeSpanTermQuery("3");
>>>> allKeywords =3D new SpanNearQuery(clauses, Integer.MAX_VALUE, =
false); //
>>>> SpanAndQuery equivalent
>>>> query =3D new SpanWithinQuery(allKeywords, endSentence, 0);
>>>> System.out.println("query: "+query);
>>>> hits =3D searcher.search(query, null, 1000).scoreDocs;
>>>> assertEquals(hits.length, 1);
>>>>=20
>>>>=20
>>>> }
>>>>=20
>>>> public SpanTermQuery makeSpanTermQuery(String text) {
>>>> return new SpanTermQuery(new Term(field, text));
>>>> }
>>>> public TermQuery makeTermQuery(String text) {
>>>> return new TermQuery(new Term(field, text));
>>>> }
>>>> }
>>>>=20
>>>> Peter
>>>>=20
>>>> On Wed, Jul 20, 2011 at 9:22 PM, Mark Miller =
<markrmiller@gmail.com>
>>> wrote:
>>>>=20
>>>>>=20
>>>>> On Jul 20, 2011, at 7:44 PM, Mark Miller wrote:
>>>>>=20
>>>>>>=20
>>>>>> On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote:
>>>>>>=20
>>>>>>> Mark Miller's 'SpanWithinQuery' patch
>>>>>>> seems to have the same issue.
>>>>>>=20
>>>>>> If I remember right (It's been more the a couple years), I did =
index
>>> the
>>>>> sentence markers at the same position as the last word in the =
sentence.
>>> And
>>>>> I think the limitation that I ate was that the word could belong =
to both
>>>>> it's true sentence, and the one after it.
>>>>>>=20
>>>>>> - Mark Miller
>>>>>> lucidimagination.com
>>>>>=20
>>>>> Perhaps you could index the sentence marker at both the last word =
of the
>>>>> sentence as well as the first word of the next sentence if there =
is one.
>>>>> This would seem to solve the above limitation as well?
>>>>>=20
>>>>> - Mark Miller
>>>>> lucidimagination.com
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>> =
---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>=20
>>>>>=20
>>>=20
>>> - Mark Miller
>>> lucidimagination.com
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>> =
---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>=20
>>>=20
>=20
> - Mark Miller
> lucidimagination.com
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>=20

- Mark Miller
lucidimagination.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org