Return-Path: Delivered-To: apmail-incubator-jackrabbit-dev-archive@www.apache.org Received: (qmail 5230 invoked from network); 16 Jun 2005 10:11:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 16 Jun 2005 10:11:05 -0000 Received: (qmail 42452 invoked by uid 500); 16 Jun 2005 10:11:02 -0000 Mailing-List: contact jackrabbit-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jackrabbit-dev@incubator.apache.org Delivered-To: mailing list jackrabbit-dev@incubator.apache.org Received: (qmail 42419 invoked by uid 99); 16 Jun 2005 10:11:01 -0000 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of marcel.reutegger@gmx.net designates 213.165.64.20 as permitted sender) Received: from mail.gmx.de (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.28) with SMTP; Thu, 16 Jun 2005 03:11:01 -0700 Received: (qmail invoked by alias); 16 Jun 2005 10:10:37 -0000 Received: from bsl-rtr.day.com (EHLO [10.0.0.65]) [212.249.34.130] by mail.gmx.net (mp019) with SMTP; 16 Jun 2005 12:10:37 +0200 X-Authenticated: #894343 Message-ID: <42B1501C.5000201@gmx.net> Date: Thu, 16 Jun 2005 12:10:36 +0200 From: Marcel Reutegger User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: jackrabbit-dev@incubator.apache.org Subject: Re: Two problems References: <139eaef50506160223189d474d@mail.gmail.com> In-Reply-To: <139eaef50506160223189d474d@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Miklos Pocsaji wrote: > 2. I started writing a TextFilter which knows how to extract text from > PDF (I implemented the TextFilter interface). It is simple, I only > have to return a java.io.Reader from which Jackrabbit extracts text. > Obvious and ugly method would be to extract a text to a string and > then return a StringReader but this would require a lot of memory. I > decided to use PiperReader-PipedWriter - a separate thread writes the > text to a PipedWriter and I return the PipedReader instance from the > doFilter() method. It seems that Jackrabbit won't read through the > passed stream immediately. Indexing of nodes is buffered in jackrabbit. this may mean that nodes are not added to the index until a query is issued. As far as I can see you have to make sure that the PipedWriter is not closed until the PipedReader is closed. regards marcel