lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jithin <jithin1...@gmail.com>
Subject Re: Writing a TokenConcatenateFilter - junk characters appearing on output.
Date Sat, 01 Oct 2011 03:19:17 GMT
Thanks a million Uwe. That fixes it.

On Sat, Oct 1, 2011 at 4:16 AM, Uwe Schindler [via Lucene] <
ml-node+s472066n3383905h73@n3.nabble.com> wrote:

> Hi,
>
> The junk is appended here: buffer.append(termAtt.buffer());
>
> I assume you are on Lucene 3.1+, so use buffer.append(termAtt); termAtt
> implements CharSequence, so it can be appended to any StringBuilder.
> The code you are using appends the whole char array, which may contain
> characters after termAtt.length().
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=0>
>
> > -----Original Message-----
> > From: Jithin [mailto:[hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=1>]
>
> > Sent: Friday, September 30, 2011 11:12 PM
> > To: [hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=2>
> > Subject: Writing a TokenConcatenateFilter - junk characters appearing on
> > output.
> >
> > Hi,
> > I am trying to write a TokenFilter which just concatenates all the the
> token in
> > the input TokenStream.
> > Issue I am facing is that my filter is outputting certain junk characters
>
> in
> > addition to the concatenated string. I believe this is caused by
> StringBuilder.
>
> >
> > This is my incrementToken() function
> >
> > public boolean incrementToken() throws IOException {
> >         //if (!input.incrementToken()) {
> >             //return false;
> >         //}
> >         if (finished) {
> >             logger.error("Finished");
> >             return false;
> >         }
> >         logger.error("Starting");
> >         StringBuilder buffer = new StringBuilder();
> >         int length = 0;
> >         while (input.incrementToken()) {
> >             logger.error(Integer.toString(buffer.length()));
> >             logger.error(buffer.toString());
> >             if (0 == length) {
> >                 buffer.append(termAtt.buffer());
> >                length += termAtt.length();
> >             } else {
> >                 buffer.append(" ").append(termAtt.buffer());
> >                length += termAtt.length() + 1;
> >             }
> >
> >         }
> >
> >         logger.error("####### Final");
> >         logger.error(Integer.toString(buffer.length()));
> >         logger.error(Integer.toString(length));
> >         logger.error(buffer.toString());
> >
> >         termAtt.setEmpty().append(buffer);
> >         offsetAtt.setOffset(0, length);
> >         finished = true;
> >         return true;
> >     }
> >
> >
> > *Output for input tokens booh and good is *
> >
> > SEVERE: Starting
> > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
> > incrementToken
> > SEVERE: 0
> > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
> > incrementToken
> > SEVERE:
> > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
> > incrementToken
> > SEVERE: 14
> > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
> > incrementToken
> > SEVERE: booh
> > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
> > incrementToken
> > SEVERE: ####### Final
> > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
> > incrementToken
> > SEVERE: 29
> > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
> > incrementToken
> > SEVERE: 9
> > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
> > incrementToken
> > SEVERE: booh good
> > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
> > incrementToken
> > SEVERE: Finished
> >
> >
> > And this is it is appearing on solr analysis
> > page.(http://localhost:8983/solr/admin/analysis.jsp)
> > org.ctown.solr.analysis.CTConcatFilterFactory
> > {luceneMatchVersion=LUCENE_34}
> > position 1
> > *term text booh#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;
> > good#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;*
> > startOffset 0
> > endOffset 9
> >
> > Kindlt help me in understanding what I am doing wrong and how to fix
> this.
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Writing-a-
> > TokenConcatenateFilter-junk-characters-appearing-on-output-
> > tp3383684p3383684.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=3>
> > For additional commands, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=4>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=5>
> For additional commands, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=6>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3383905.html
>  To unsubscribe from Writing a TokenConcatenateFilter - junk characters
> appearing on output., click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3383684&code=aml0aGluMTk4N0BnbWFpbC5jb218MzM4MzY4NHwtMTEwMTgwMTA3Ng==>.
>
>



-- 
Thanks
Jithin Emmanuel


--
View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3384323.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message