lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Babak Farhang <farh...@gmail.com>
Subject Re: get wordno, lineno, pageno for term/phrase
Date Sun, 08 Aug 2010 00:00:03 GMT
How about making each line a separate document? You'd worry about
scaling it later (e.g. the 32-bit limitation in the number of docs in
an index)..

On Fri, Aug 6, 2010 at 11:37 AM, arun r <arun.raj@gmail.com> wrote:
> I am trying to create a custom analyzer that will check for pagebreak
> and linebreak and add the payload data for each term. In the custom
> filter I have this code:
>
> public boolean incrementToken() throws IOException {
>
>                if(input.incrementToken())
>                {
>                        if(termAtt.term().equals(pageBreak)){
>                                System.out.println("pageBreak");
>                                pageCount++;
>                        }
>                        else if(termAtt.term().equals(lineBreak))
>                        {
>                                System.out.println("lineBreak");
>                                lineCount++;
>                        }
>                        else
>                                addPayload(lineCount, pageCount);
>
>                        return true;
>                }
>                else
>                        return false;
>        }
>
> where pageBreak and lineBreak is defined as :
> int pageBreakAscii = 12;
> String pageBreak = new Character ((char) pageBreakAscii).toString();
> String lineBreak = System.getProperty("line.separator");
>
> And am using the WhitespaceAnalyzer tokenstream, which ignores the
> pageBreak and lineBreak. Is there a way to create a analyzer that will
> ignore the pagebreak and linebreak characters during search, but give
> access to them in  incrementToken() in the filter ?
>
>
> --
> Where there is a will, there is a way !
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message