Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 10681C267 for ; Sat, 22 Jun 2013 07:10:38 +0000 (UTC) Received: (qmail 95418 invoked by uid 500); 22 Jun 2013 07:10:36 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 95404 invoked by uid 500); 22 Jun 2013 07:10:34 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 95394 invoked by uid 99); 22 Jun 2013 07:10:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Jun 2013 07:10:32 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [81.169.146.161] (HELO mo-p00-ob.rzone.de) (81.169.146.161) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Jun 2013 07:10:25 +0000 X-RZG-AUTH: :LWIAZ0WpaN8UY5o8XRz0jOyrHsdEC+nAE10OdySrgHL6ku8U1wBZiSoqzZI= X-RZG-CLASS-ID: mo00 Received: from [192.168.1.8] (dslb-088-076-238-038.pools.arcor-ip.net [88.76.238.38]) by smtp.strato.de (joses mo42) (RZmta 31.28 DYNA|AUTH) with ESMTPA id J00a2cp5M6Ugql for ; Sat, 22 Jun 2013 09:09:44 +0200 (CEST) Message-ID: <51C54DB8.8030201@lehmi.de> Date: Sat, 22 Jun 2013 09:09:44 +0200 From: Andreas Lehmkuehler User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6 MIME-Version: 1.0 To: users@pdfbox.apache.org Subject: Re: Extracting local language (Sinhala Unicode) from a pdf References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, Am 20.06.2013 12:26, schrieb Supun Nakandala: > Hi, > I want to extract Sinhala (local language) from a pdf file. I am not > familiar with pdfbox. I would like to know whether is this possible and how > can I do it using pdfbox I depends on the pdfs and the used kind of fonts. I suggest to give it a try. There are some easy to use command line tools such as ExtractText, see [1] for further details. > Thank you. > Regards Supun BR Andreas Lehmk�hler [1] http://pdfbox.apache.org/commandline/