lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From José Tomás Atria <jtat...@gmail.com>
Subject Re: Single string automaton causes NPE on Terms.intersect( CompiledAutomaton, BytesRef term )
Date Mon, 28 Mar 2016 19:48:25 GMT
Hi Mike,

I'd be happy to, but I have never used JIRA before and I don't entirely
understand what you mean by adding a test case as a patch (academic
programmer here, we are notoriously ignorant of established development
practices :P).

thanks!
jta

On Fri, Mar 25, 2016 at 7:54 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Hi José,
>
> Can you please open a Jira issue about this, and add a test case as a
> patch, if you can?  I think it's bad you hit an NPE!  Not sure how
> best to fix it, but we can iterate on the issue.
>
> Thanks!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Mar 25, 2016 at 7:11 PM, José Tomás Atria <jtatria@gmail.com>
> wrote:
> > Ok, digging a little more, I found that the problem mentioned above seems
> > to be caused by FieldReader overriding the intersect( CompiledAutomaton,
> > BytesRef )
> > <
> https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/index/Terms.html#intersect(org.apache.lucene.util.automaton.CompiledAutomaton,%20org.apache.lucene.util.BytesRef)
> >
> > method
> > in Terms.
> >
> > The overriden method checks to see if the compiled automaton is not
> > AUTOMATON_TYPE.NORMAL, and if it isn't, throws an
> IllegalArgumentException
> > and instructs one to use CompiledAutomaton.getTermsEnum( Terms ) instead:
> >     if (compiled.type != CompiledAutomaton.AUTOMATON_TYPE.NORMAL) {
> >       throw new IllegalArgumentException("please use
> > CompiledAutomaton.getTermsEnum instead");
> >     }
> >
> > which, of course, works perfectly, so I'm doing that now and the problem
> is
> > no more.
> >
> > However, the method in FieldReader just assumes that the compiled
> automaton
> > is AUTOMATON_TYPE.NORMAL, which causes the above NPE, because the
> > runAutomaton of a non-normal CompiledAutomaton is set to null in the
> > constructor, lines 191 to 209:
> >
> > IntsRef singleton = Operations.getSingleton(automaton);
> >
> > if (singleton != null) {
> >   // matches a fixed string
> >   type = AUTOMATON_TYPE.SINGLE;
> >   commonSuffixRef = null;
> >   runAutomaton = null; // <- HERE!
> >   this.automaton = null;
> >   this.finite = null;
> >
> >   if (isBinary) {
> >     term = StringHelper.intsRefToBytesRef(singleton);
> >   } else {
> >     term = new BytesRef(UnicodeUtil.newString(singleton.ints,
> > singleton.offset, singleton.length));
> >   }
> >   sinkState = -1;
> >   return;
> > }
> >
> > Not to pretend I have any idea of what I'm talking about, but given that
> > the user has relatively little control on which implementation of Terms
> we
> > get at runtime (this user at least), shouldn't the overriding method in
> > FieldReader also check the AUTOMATON_TYPE and throw an equally
> informative
> > IllegalArgumentException instead, just for the sake of consistency?
> >
> > Sorry if all of the above is a little off topic for this list :)
> >
> > Best,
> > jta
> >
> >
> > On Fri, Mar 25, 2016 at 4:33 PM, José Tomás Atria <jtatria@gmail.com>
> wrote:
> >
> >> Hello again!
> >>
> >> I'm playing around some more with Lucene's automata, and I've bumped
> into
> >> something unexpected but can't figure out if its a bug or an error on my
> >> part.
> >>
> >> briefly: Is it possible to use a single string automaton (i.e. the
> result
> >> of Automata.makeString( String ) )  to intersect a Terms instance? I
> keep
> >> getting NPE's on every attempt at doing this... e.g. this code:
> >>
> >> // where "term" is a term known to exist in someField
> >> CompiledAutomaton ca = new CompiledAutomaton( Automata.makeString(
> "term"
> >> ) );
> >> Terms terms = leafReader.terms( someField );
> >> TermsEnum tEnum = terms.intersect( ca, null );
> >>
> >> results in:
> >> Exception in thread "main" java.lang.NullPointerException
> >> at
> >>
> org.apache.lucene.codecs.blocktree.IntersectTermsEnum.<init>(IntersectTermsEnum.java:127)
> >> at
> >>
> org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:185)
> >>
> >> I assume I'm doing something wrong (I am aware that using an automaton
> for
> >> a single term may be a bad idea, but bear with me), but the fact that
> it's
> >> throwing an NPE prompted me to come and ask...
> >>
> >> Maybe there's a problem with encodings?
> >>
> >> Any help greatly appreciated.
> >> jta.
> >>
> >> --
> >> entia non sunt multiplicanda praeter necessitatem
> >>
> >
> >
> >
> > --
> > entia non sunt multiplicanda praeter necessitatem
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
entia non sunt multiplicanda praeter necessitatem

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message