Return-Path: Delivered-To: apmail-incubator-pdfbox-users-archive@minotaur.apache.org Received: (qmail 51492 invoked from network); 18 Feb 2009 09:33:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Feb 2009 09:33:42 -0000 Received: (qmail 83356 invoked by uid 500); 18 Feb 2009 09:33:42 -0000 Delivered-To: apmail-incubator-pdfbox-users-archive@incubator.apache.org Received: (qmail 83344 invoked by uid 500); 18 Feb 2009 09:33:42 -0000 Mailing-List: contact pdfbox-users-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pdfbox-users@incubator.apache.org Delivered-To: mailing list pdfbox-users@incubator.apache.org Received: (qmail 83333 invoked by uid 99); 18 Feb 2009 09:33:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2009 01:33:42 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jukka.zitting@gmail.com designates 209.85.218.213 as permitted sender) Received: from [209.85.218.213] (HELO mail-bw0-f213.google.com) (209.85.218.213) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2009 09:33:35 +0000 Received: by bwz9 with SMTP id 9so4763791bwz.12 for ; Wed, 18 Feb 2009 01:33:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=N8MnyR8cHwT033bfLkx7r31Hs+d/rd86L3t08wl0VTU=; b=fG4EmqWM3ebdA21vDPuOgLWd6TvPtYOUExsFktsmOqvcmu1SXksx2iXogUDwm+It4V E3VdhouZjDDA3F4YSXAnUBg0YRAUXv8W10xprNNxWq13+OLVIG9LSR+ZVXB4nQPdQX7z pZ1B7+vbgWm/LOV5PWrR6CIUVSpInCHLE1IFA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=GdvmiRYx6JeZpMH0r12hEZ1r2XMdSqDF1uvRWCn+ber44bN0yX+A4DJgCV+/7X9dzf KLTycEgbVp0H5E8rIG7/CtTWMoM+OthXQYSJRIp0TdS7qC+RLTGZ5a2yQtAguDv7b4iE yyyKeUU8guTsmwyJxuNociitvAiIjHhr9lj6Q= MIME-Version: 1.0 Received: by 10.181.218.14 with SMTP id v14mr2726839bkq.111.1234949593758; Wed, 18 Feb 2009 01:33:13 -0800 (PST) In-Reply-To: References: Date: Wed, 18 Feb 2009 10:33:13 +0100 Message-ID: <510143ac0902180133j4de64c67oe9b618af0b73281@mail.gmail.com> Subject: Re: PDFBox - Read pdf file line by line using C#.Net From: Jukka Zitting To: pdfbox-users@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, On Mon, Feb 16, 2009 at 6:15 PM, Moshe Liaks wrote: > I use the code below to read a pdf file. > The code is working fine. The problem is that I have to read the pdf > line by line and not like "one big string". > I have this need, because the text is a complex one, and I need to > apply some filters while reading each line from the original. You could subclass the PDFTextStripper class, and do your filtering in the writeLineSeparator() method after buffering all the text on that line. BR, Jukka Zitting