lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Query.toString printing binary in the output...
Date Tue, 19 Mar 2013 16:49:51 GMT
I'm afraid I won't have time to dig into this for a while, anyone else want
to chime in?

Erick


On Tue, Mar 19, 2013 at 9:08 AM, Andrew Lundgren
<lundgren@familysearch.org>wrote:

> This is perhaps more clear:
>
> Assuming you have a schema where:
>
>   <field name="collection_id" type="integer" indexed="true" stored="false"
> required="true" omitTermFreqAndPositions="true"/>
>
> Then:
>
>   void testSamplePrint()throws IOException, SAXException,
> ParserConfigurationException{
>
>       SolrConfig config = new SolrConfig("solrconfig.xml");
>       IndexSchema schema = new IndexSchema(config, "schema.xml", null);
>
>       TermQuery aTerm=new TermQuery(new Term("TestString","123456"));
>       TermQuery bTerm=new TermQuery(new Term("TestString",
>
> schema.getField("collection_id").getType().readableToIndexed("123456")));
>
>       System.out.printf("%s\n", aTerm.toString());
>       System.out.printf("%s\n", bTerm.toString());
>
>       assertEquals(aTerm.toString(),bTerm.toString());
>
>   }
>
> The test output is:
>
> java.lang.AssertionError:
> Expected :TestString:123456
> Actual   :TestString:`
>
> I believe that this is because the Term does not know that it contains an
> encoded integer, and thus cannot parse it.  If the TermQuery knew the type,
> it could also decode it.  But w/o a query to the schema, I don't know how
> to get the toString to function correctly.
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Monday, March 18, 2013 7:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query.toString printing binary in the output...
>
> If you simply attach &debug=all to your URL, you should see the query come
> back in your response, XML, JSON, whatever. If that also shows bizarre
> characters, then that will give you some idea whether it's in Solr or not.
>
> But you haven't given us much info about how/where you call toString. You
> may be getting into trouble with character sets (although I'd find that
> quite odd, but its a possibility.
>
> What I'm really finding confusing is that you're mentioning Term alongside
> query.toString() (at least that's what I think you're saying), which has
> nothing at all to do with Terms, it's just the query string passed in. So
> I'm really puzzled as to what you're doing to get this kind of output, it
> almost looks like you're trying to print out the _results_ of a query, not
> the query.
>
> So some clarification would be helpful...
>
> Best
> Erick
>
>
> On Mon, Mar 18, 2013 at 12:01 PM, Andrew Lundgren <
> lundgren@familysearch.org
> > wrote:
>
> > I am sorry, I don't follow what you mean by debug=query.  Can you
> > elaborate on that a bit?
> >
> > Thanks!
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > Sent: Sunday, March 17, 2013 8:09 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Query.toString printing binary in the output...
> >
> > Hmmm, without looking at the code, somehow when you specify
> > debug=query you get readable results, maybe that code would be a place
> to start?
> >
> > And are you looking for the parsed output? Otherwise you could print
> > original query.
> >
> > Not much help....
> > Erick
> >
> >
> > On Fri, Mar 15, 2013 at 3:24 PM, Andrew Lundgren
> > <lundgren@familysearch.org>wrote:
> >
> > > We use the toString call on the query in our logs.  For some numeric
> > > types, the encoded form of the number is being printed instead of
> > > the readable form.
> > >
> > > This makes tail and some other tools very unhappy...
> > >
> > > Here is a partial example of a query.toString() that would have had
> > > binary in it.  As a short term work around I replaced all
> > > non-printable characters in the string with an '_'.
> > >
> > > (collection_id:`__z_[^0.027 collection_id:`__nB+^0.026
> > > collection_id:`__Zl_^0.025 collection_id:`__i49^0.024
> > > collection_id:`__Pq%^0.023 collection_id:`__VCS^0.022
> > > collection_id:`__WbH^0.021 collection_id:`__Yu_^0.02
> > > collection_id:`__UF&^0.019 collection_id:`__I2g^0.018
> > > collection_id:`__PP_^0.016999999 collection_id:`__Ysv^0.015999999
> > > collection_id:`__Oe_^0.014999999 collection_id:`__Ysw^0.013999999
> > > collection_id:`__Wi_^0.012999998 collection_id:`__fLi^0.011999998
> > > collection_id:`__XRk^0.010999998 collection_id:`__Uz[^0.009999998
> > > collection_id:`__SE_^0.008999998 collection_id:`__Ysx^0.007999998
> > > collection_id:`__Ysh^0.0069999974 collection_id:`__fLh^0.0059999973
> > > collection_id:`__f _^0.004999997 collection_id:`__`^C^0.003999997
> > > collection_id:`__fKM^0.002999997 collection_id:`__Szo^0.001999997
> > > collection_id:`__f ]^9.99997E-4)
> > >
> > > But, as you can see, that is less than useful...
> > >
> > > I spent some time looking at the source and found that Term does not
> > > contain the type of the embedded data.  Any possible solutions to
> > > this short of walking the query and getting the type of each field
> > > from the schema and creating my own print function?
> > >
> > > Thanks!
> > >
> > > --
> > > Andrew
> > >
> > >
> > >
> > >
> > >  NOTICE: This email message is for the sole use of the intended
> > > recipient(s) and may contain confidential and privileged information.
> > > Any unauthorized review, use, disclosure or distribution is
> > > prohibited. If you are not the intended recipient, please contact
> > > the sender by reply email and destroy all copies of the original
> message.
> > >
> > >
> >
> >
> >  NOTICE: This email message is for the sole use of the intended
> > recipient(s) and may contain confidential and privileged information.
> > Any unauthorized review, use, disclosure or distribution is
> > prohibited. If you are not the intended recipient, please contact the
> > sender by reply email and destroy all copies of the original message.
> >
> >
>
>
>  NOTICE: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged information. Any
> unauthorized review, use, disclosure or distribution is prohibited. If you
> are not the intended recipient, please contact the sender by reply email
> and destroy all copies of the original message.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message