lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Multiphrase Query in Lucene 4.3
Date Thu, 03 Oct 2013 11:41:25 GMT
Then I suggest you start a new thread, posting all relevant details
and preferably a cut down but complete program, with all relevant
code, and no irrelevant code, with simple examples, input and output,
of what does and doesn't work,


--
Ian.


On Thu, Oct 3, 2013 at 12:28 PM, VIGNESH S <vigneshklncit@gmail.com> wrote:
> Ian,
> Thanks for your reply..
> I am facing the same problem if i use whiteSpaceTokenizer also.
> My analyzer works perfect in case of Lucene 3.6.
>
> Thanks and Regards
> Vignesh Srinivasan
>
> On Thu, Oct 3, 2013 at 3:23 PM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> Certainly sounds like a bug in your analyzer.  You could start a new
>> thread if you need help with that.  But from your previous email it
>> sounds like you could use WhitespaceTokenizer chained with
>> LowerCaseFilter.
>>
>>
>> --
>> Ian.
>>
>>
>> On Thu, Oct 3, 2013 at 7:16 AM, VIGNESH S <vigneshklncit@gmail.com> wrote:
>> > Hi,
>> >
>> > In my Analyzer,problem actually occurs for words which are preceded by
>> > punctuation marks..
>> >
>> > For Example:
>> > If I am Indexing content    ",Andrey Gubarev,JingGoogle,Inc."
>> >
>> > If I search "Andrew Gubarev" ,It is not working properly since word
>> Andrew
>> > is preceded by punctuation ",".
>> >
>> >
>> > On Thu, Oct 3, 2013 at 11:23 AM, VIGNESH S <vigneshklncit@gmail.com>
>> wrote:
>> >
>> >> Hi Ian,
>> >>
>> >> In Lucene Is there any Default Analyzer we can use which will ignore
>> only
>> >> Spaces.
>> >> All other numbers,punctuation,dates everything it should preserve.
>> >>
>> >> I created my analyzer  with tokenizer which returns
>> >> Character.isDefined(cn) && (!Character.isWhitespace(cn)).
>> >> My analyzer will use a lowe case filter on top of the tokenizer.This
>> Woks
>> >> Perfect in case of 3.6
>> >> In 4.3 it is creating problems in offsets of tokens.
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea <ian.lea@gmail.com> wrote:
>> >>
>> >>> Whenever someone says they are using a custom analyzer that has to be
>> >>> a suspect.  Does it work if you use one of the core lucene analyzers
>> >>> instead?  Have you used Luke to verify that the index holds what you
>> >>> think it does?
>> >>>
>> >>>
>> >>> --
>> >>> Ian.
>> >>>
>> >>>
>> >>> On Mon, Sep 30, 2013 at 3:21 PM, VIGNESH S <vigneshklncit@gmail.com>
>> >>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > It is not the problem with case..Because Iam using LowercaseFilter.
>> >>> >
>> >>> > My Analyzer is a custom analyzer which will ignore just white
>> spaces.All
>> >>> > other numbers date and other special characters it will consider.The
>> >>> Same
>> >>> > analyzer works for Lucene 3.6.
>> >>> >
>> >>> >
>> >>> > When i do a single term query for "Geoffrey" it is giving hits..But
>> when
>> >>> > given as a part of multiphrase query ,it is not able to find..When
>> the
>> >>> > below code is Executed with say word ="Geoffrey",it is not finding
>> the
>> >>> word
>> >>> > itself ..
>> >>> >
>> >>> > if(TermsEnum.SeekStatus.FOUND ==trm.seekCeil(new BytesRef(word)))
>> >>> >  {                            do {
>> >>> >                                   String s =
>> trm.term().utf8ToString();
>> >>> >                                   if (s.equals(word)) {
>> >>> >                                     termsWithPrefix.add(new
>> >>> Term("content",
>> >>> > s));
>> >>> >                                   } else {
>> >>> >                                     break;
>> >>> >                                   }
>> >>> >                                 }
>> >>> >  while (trm.next() != null);
>> >>> >  }
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea <ian.lea@gmail.com>
wrote:
>> >>> >
>> >>> >> Whenever someone says something along the lines of a search
for
>> >>> >> "geoffrey" not matching "Geoffrey" the case difference springs
out,
>> >>> >> Can't recall what if anything you said about the analysis side
of
>> >>> >> things but that could be the cause.  See
>> >>> >>
>> >>> >>
>> >>>
>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>> >>> >>
>> >>> >> If on the other hand the problem is more obscure, and only
related
>> to
>> >>> >> the multi phrase stuff, I suggest you build a tiny but complete
>> >>> >> RAMDirectory based program or test case that shows the problem
and
>> >>> >> post it here.
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Ian.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On Mon, Sep 30, 2013 at 6:46 AM, VIGNESH S <vigneshklncit@gmail.com
>> >
>> >>> >> wrote:
>> >>> >> > Hi,
>> >>> >> >
>> >>> >> > Thanks for your Reply.The Problem I face is there is a
word called
>> >>> >> Geoffrey
>> >>> >> > Romer in my Field.
>> >>> >> >
>> >>> >> > I am Forming a Multiphrase query object properly like
" Geoffrey
>> >>> >> Romer".But
>> >>> >> > When i do a Search,it is not returning Hits.This Problem
I am
>> facing
>> >>> is
>> >>> >> not
>> >>> >> > for all phrases
>> >>> >> > This Problem happens for only few Phrases.
>> >>> >> >
>> >>> >> > When i do a single query like Geoffrey it is giving a
Hit..But
>> when
>> >>> i do
>> >>> >> it
>> >>> >> > in MultiphraseQuery it is not able to find "geoffrey".
I confirmed
>> >>> this
>> >>> >> by
>> >>> >> > doing trm.seekCeil(new BytesRef("Geoffrey"))  and then
and then
>> when
>> >>> i
>> >>> >> > do String s = trm.term().utf8ToString().It is pointing
to a
>> diffrent
>> >>> word
>> >>> >> > instead of geoffrey.seekceil is working properly for many
phrases
>> >>> though.
>> >>> >> >
>> >>> >> > What could be the problem..please kindly suggest.
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > On Fri, Sep 27, 2013 at 6:58 PM, Allison, Timothy B. <
>> >>> tallison@mitre.org
>> >>> >> >wrote:
>> >>> >> >
>> >>> >> >> 1) An alternate method to your original question would
be to do
>> >>> >> something
>> >>> >> >> like this (I haven't compiled or tested this!):
>> >>> >> >>
>> >>> >> >> Query q = new PrefixQuery(new Term("field", "app"));
>> >>> >> >>
>> >>> >> >> q = q.rewrite(indexReader) ;
>> >>> >> >> Set<Term> terms = new HashSet<Term>();
>> >>> >> >> q.extractTerms(terms);
>> >>> >> >> Term[] arr = terms.toArray(new Term[terms.size()]);
>> >>> >> >> MultiPhraseQuery mpq = new MultiPhraseQuery();
>> >>> >> >> mpq.add(new Term("field", "microsoft");
>> >>> >> >> mpq.add(arr);
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> 2) At a higher level, do you need to generate your
query
>> >>> >> programmatically?
>> >>> >> >>  Here are three parsers that could handle this:
>> >>> >> >>   a) ComplexPhraseQueryParser
>> >>> >> >>   b) SurroundQueryParser:
>> >>> oal.queryparser.surround.parser.QueryParser
>> >>> >> >>   c) experimental: <self_promotion degree="shameless">
>> >>> >> >> http://issues.apache.org/jira/browse/LUCENE-5205
>> </self_promotion>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> -----Original Message-----
>> >>> >> >> From: VIGNESH S [mailto:vigneshklncit@gmail.com]
>> >>> >> >> Sent: Friday, September 27, 2013 3:33 AM
>> >>> >> >> To: java-user@lucene.apache.org
>> >>> >> >> Subject: Re: Multiphrase Query in Lucene 4.3
>> >>> >> >>
>> >>> >> >> Hi,
>> >>> >> >>
>> >>> >> >> The word i am giving is "Romer Geoffrey ".The Word
is in the
>> Field.
>> >>> >> >>
>> >>> >> >>  trm.seekCeil(new BytesRef("Geoffrey")) and then when
i do
>> String s
>> >>> =
>> >>> >> >> trm.term().utf8ToString(); and hence
>> >>> >> >>
>> >>> >> >> It is giving a diffrent word..I think this is why
my
>> >>> multiphrasequery is
>> >>> >> >> not giving desired results.
>> >>> >> >>
>> >>> >> >> What may be the reason..
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On Fri, Sep 27, 2013 at 11:49 AM, VIGNESH S <
>> >>> vigneshklncit@gmail.com>
>> >>> >> >> wrote:
>> >>> >> >>
>> >>> >> >> > Hi Lan,
>> >>> >> >> >
>> >>> >> >> > Thanks for your Reply.
>> >>> >> >> >
>> >>> >> >> > I am doing similar to this only..In MultiPhraseQuery
object
>> actual
>> >>> >> phrase
>> >>> >> >> > is going proper but it is not returning any hits..
>> >>> >> >> >
>> >>> >> >> > In Lucene 3.6,I implemented the same logic and
it is working.
>> >>> >> >> >
>> >>> >> >> > In Lucene 4.3,I implemented the Index for that
 using
>> >>> >> >> >
>> >>> >> >> >  FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >>
>> >>> >>
>> >>>
>>  offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
>> >>> >> >> >
>> >>> >> >> > For MultiphraseQuery, whether I need to add any
other
>> parameter in
>> >>> >> >> > addition to this while indexing?
>> >>> >> >> >
>> >>> >> >> > Is there any MultiPhraseQueryTest java file for
Lucene 4.3? I
>> >>> checked
>> >>> >> in
>> >>> >> >> > Lucene branch and i was not able to find..Please
kindly help.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea <ian.lea@gmail.com>
>> >>> wrote:
>> >>> >> >> >
>> >>> >> >> >> I use the code below to do something like
this.  Not exactly
>> >>> what you
>> >>> >> >> >> want but should be easy to adapt.
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> public List<String> findTerms(IndexReader
_reader,
>> >>> >> >> >>                               String _field)
throws
>> IOException {
>> >>> >> >> >>   List<String> l = new ArrayList<String>();
>> >>> >> >> >>   Fields ff = MultiFields.getFields(_reader);
>> >>> >> >> >>   Terms trms = ff.terms(_field);
>> >>> >> >> >>   TermsEnum te = trms.iterator(null);
>> >>> >> >> >>   BytesRef br;
>> >>> >> >> >>   while ((br = te.next()) != null) {
>> >>> >> >> >>     l.add(br.utf8ToString());
>> >>> >> >> >>   }
>> >>> >> >> >>   return l;
>> >>> >> >> >> }
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> Ian.
>> >>> >> >> >>
>> >>> >> >> >> On Wed, Sep 25, 2013 at 3:04 PM, VIGNESH
S <
>> >>> vigneshklncit@gmail.com>
>> >>> >> >> >> wrote:
>> >>> >> >> >> > Hi,
>> >>> >> >> >> >
>> >>> >> >> >> > In the Example of Multiphrase Query
it is mentioned
>> >>> >> >> >> >
>> >>> >> >> >> > "To use this class, to search for the
phrase "Microsoft
>> app*"
>> >>> first
>> >>> >> >> use
>> >>> >> >> >> > add(Term) on the term "Microsoft", then
find all terms that
>> >>> have
>> >>> >> "app"
>> >>> >> >> >> as
>> >>> >> >> >> > prefix using IndexReader.terms(Term),
and use
>> >>> >> >> >> MultiPhraseQuery.add(Term[]
>> >>> >> >> >> > terms) to add them to the query"
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > How can i replicate the Same in Lucene
4.3 since
>> >>> >> >> >> IndexReader.terms(Term) is
>> >>> >> >> >> > no more used
>> >>> >> >> >> >
>> >>> >> >> >> > --
>> >>> >> >> >> > Thanks and Regards
>> >>> >> >> >> > Vignesh Srinivasan
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> ---------------------------------------------------------------------
>> >>> >> >> >> To unsubscribe, e-mail:
>> java-user-unsubscribe@lucene.apache.org
>> >>> >> >> >> For additional commands, e-mail:
>> >>> java-user-help@lucene.apache.org
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > --
>> >>> >> >> > Thanks and Regards
>> >>> >> >> > Vignesh Srinivasan
>> >>> >> >> > 9739135640
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> Thanks and Regards
>> >>> >> >> Vignesh Srinivasan
>> >>> >> >> 9739135640
>> >>> >> >>
>> >>> >> >>
>> >>> ---------------------------------------------------------------------
>> >>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>> >> >> For additional commands, e-mail:
>> java-user-help@lucene.apache.org
>> >>> >> >>
>> >>> >> >>
>> >>> >> >
>> >>> >> >
>> >>> >> > --
>> >>> >> > Thanks and Regards
>> >>> >> > Vignesh Srinivasan
>> >>> >> > 9739135640
>> >>> >>
>> >>> >>
>> ---------------------------------------------------------------------
>> >>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Thanks and Regards
>> >>> > Vignesh Srinivasan
>> >>> > 9739135640
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Thanks and Regards
>> >> Vignesh Srinivasan
>> >> 9739135640
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks and Regards
>> > Vignesh Srinivasan
>> > 9739135640
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message