lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jithin <jithin1...@gmail.com>
Subject Re: Writing a TokenConcatenateFilter - junk characters appearing on output.
Date Sat, 01 Oct 2011 07:19:27 GMT
Figured out the issue. finished variable needs to be reinitialized to false
once current stream is over.

    if (finished) {
        logger.debug("Finished");
        finished = false;
        return false;
    }

Looks like the same class is being reused. Makes sense.


On Sat, Oct 1, 2011 at 10:57 AM, Jithin [via Lucene] <
ml-node+s472066n3384419h7@n3.nabble.com> wrote:

> I meant to say. Now my analser chain looks like this.
>
>             <analyzer type="index">
>
>
>                 <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[-_]" replacement=" " />
>
>                 <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />
>
>                 <tokenizer class="solr.WhitespaceTokenizerFactory" />
>
>
>                 <filter class="solr.LowerCaseFilterFactory" />
>
>
>                 <filter class="solr.StopWordFilterFactory"
> ignoreCase="true"
>
>                     words="words.txt" />
>
>                 <filter
> class="org.ctown.solr.analysis.CTConcatFilterFactory" />
>
>
>             </analyzer>
>             <analyzer type="query">
>
>
>                 <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[-_]" replacement=" " />
>
>                 <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />
>
>                 <tokenizer class="solr.KeywordTokenizerFactory" />
>
>
>
>
>             </analyzer>
>
> But only my first document is getting indexed. Is there any logging I can
> enable to see what is going wrong?
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3384419.html
>  To unsubscribe from Writing a TokenConcatenateFilter - junk characters
> appearing on output., click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3383684&code=aml0aGluMTk4N0BnbWFpbC5jb218MzM4MzY4NHwtMTEwMTgwMTA3Ng==>.
>
>



-- 
Thanks
Jithin Emmanuel


--
View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3384528.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message