From users-return-11032-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Thu May 10 17:30:49 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id E583118063A for ; Thu, 10 May 2018 17:30:48 +0200 (CEST) Received: (qmail 24953 invoked by uid 500); 10 May 2018 15:30:43 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 24941 invoked by uid 99); 10 May 2018 15:30:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 May 2018 15:30:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 65DD818082D for ; Thu, 10 May 2018 15:30:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.99 X-Spam-Level: X-Spam-Status: No, score=0.99 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_NONE=-0.0001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id zJ0lUKdNpGGE for ; Thu, 10 May 2018 15:30:39 +0000 (UTC) Received: from mailout10.t-online.de (mailout10.t-online.de [194.25.134.21]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id BB3E75F1B4 for ; Thu, 10 May 2018 15:30:39 +0000 (UTC) Received: from fwd07.aul.t-online.de (fwd07.aul.t-online.de [172.20.27.150]) by mailout10.t-online.de (Postfix) with SMTP id 7E2BF41F32C8 for ; Thu, 10 May 2018 17:30:33 +0200 (CEST) Received: from [192.168.2.108] (STevW0Z-ZhPNXnaJNLSzvZzm999Q6qlZ7ZGoAhSEMaYjLUv926Ur7erhoguHmmCg2f@[217.231.130.204]) by fwd07.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted) esmtp id 1fGnWN-3Dkegq0; Thu, 10 May 2018 17:30:19 +0200 Subject: Re: Accessing check-boxes in a non-acro form PDF To: users@pdfbox.apache.org References: From: Tilman Hausherr Message-ID: <12484c3e-7805-3df9-8387-827bb8b1509a@t-online.de> Date: Thu, 10 May 2018 17:30:18 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-ID: STevW0Z-ZhPNXnaJNLSzvZzm999Q6qlZ7ZGoAhSEMaYjLUv926Ur7erhoguHmmCg2f X-TOI-MSGID: e3e2b8cb-8bbc-4366-bcf5-f8e8a9678472 Am 10.05.2018 um 16:26 schrieb Ankit Inkollu: > Hi All, > > *Scenario:* > I need to verify if the check-box for a certain field in a non-acro form > PDF is ticked or not. > > *Options tried:* > 1. I tried to search for any class in PDFBOX which points to the check-box > but could not find any. There isn't if it isn't acroform nor xfa. A box is just a box, i.e. a shape somewhere. (Unless the character for a checked box is used) > 2. Tried using the co-ordinates of the check-box and create an image and > then compare it against an already stored image of a check-box but this is > quite cumbersome and fails for few PDFs. > > > Is there a way in PDFBox which can implement the above mentioned scenario. > If this does not work out, is there an OCR API in JAVA which will help. Tesseract has a java interface. But not PDFBox. Tika has an OCR option and it will use tesseract. Tilman > > Do let me know if any of you have faced such a situation. > > Thanks > Ankit > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org For additional commands, e-mail: users-help@pdfbox.apache.org