Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9B06E200AE4 for ; Fri, 10 Jun 2016 03:20:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 998C6160A59; Fri, 10 Jun 2016 01:20:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E32B5160A58 for ; Fri, 10 Jun 2016 03:20:24 +0200 (CEST) Received: (qmail 4118 invoked by uid 500); 10 Jun 2016 01:20:23 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 4106 invoked by uid 99); 10 Jun 2016 01:20:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jun 2016 01:20:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 756C2C0773 for ; Fri, 10 Jun 2016 01:20:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id xvF6GetLrEfE for ; Fri, 10 Jun 2016 01:20:18 +0000 (UTC) Received: from mail-it0-f53.google.com (mail-it0-f53.google.com [209.85.214.53]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 71FCA5F1BE for ; Fri, 10 Jun 2016 01:20:18 +0000 (UTC) Received: by mail-it0-f53.google.com with SMTP id e5so89157ith.0 for ; Thu, 09 Jun 2016 18:20:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=to4Rpi4ptUQ08c1LEVtfaH/mEy+N8hsWYp3yUXfduaU=; b=EX1qtJvmEoOdYnfayWrR6VYKQxQp1hMgxQbeE5CfjeaG7GXU0E9Aw/xR8lvNEXNM0b ZdVwH5p04szKPJo2nr9ik1if4555Mn5kOnM1/Z0qWC5Usa6QzD+DZaO33t1Ram1FWeQE dr2IDnjZldt3LCS1qnM+J23VxrrMDJfIVmElAjVMpMhUrd4qEbpcN9qElv9Rgm6niJpg IVLrnX3sjhU0moULeYsf79T4EAO9B9xqtgC+D28thx3hlTqIf2hfwmzVoH99WhO1Y0qu qSur6F2WQSkxeOuSA/ywCJxNlkUNYTMooboEazhg0G0at5lzR8E9Zs6LlFGUBqO9VZyK f9Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=to4Rpi4ptUQ08c1LEVtfaH/mEy+N8hsWYp3yUXfduaU=; b=T9ZNIPAk+bqRYAwRG2dDY/fsDbegDIFXn97ldvEimEFs5romxmqFoMXLVm9TESdNbJ 4Dr8prTp9Vh09NboOydl4or6efLC9kNAjzvy/wZzKRxjI3r5TSYcit6i/w5P2GsGp7yh kTMkoVy5FL21XxWUiguDjfabmMfbC+HFHvlRau1ZPgcx7NA1VZeXDTOhQl1H9r84yov7 9qYrI5e+NqJSB53ojMOo1LKJYjjHziPytq44aka/TsE4Hi4iBaxRLELR2d26hPDSFSX9 7ie/scvMZXLRspE6DL/w9ZVsx2cY+fF4Jk5D3ZlqUiP7vJ6NzNda9nR7yW8ponC5w/dL 8bJQ== X-Gm-Message-State: ALyK8tJb30q6jIiSXJsvgljljsP9kW4V028wVPxJtCdy/yyxmuxBTj1oMtRNeL5/UIjg8TRxQb12+0qYT1PqCQ== X-Received: by 10.36.20.206 with SMTP id 197mr42945itg.24.1465521617711; Thu, 09 Jun 2016 18:20:17 -0700 (PDT) MIME-Version: 1.0 From: Justin Lee Date: Fri, 10 Jun 2016 01:20:07 +0000 Message-ID: Subject: Bypassing ExtractingRequestHandler To: "solr-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=001a11438b946f3a300534e257c5 archived-at: Fri, 10 Jun 2016 01:20:25 -0000 --001a11438b946f3a300534e257c5 Content-Type: text/plain; charset=UTF-8 Has anybody had any experience bypassing ExtractingRequestHandler and simply managing Tika manually? I want to make a small modification to Tika to get and save additional data from my PDFs, but I have been procrastinating in no small part due to the unpleasant prospect of setting up a development environment where I could compile and debug modifications that might run through PDFBox, Tika, and ExtractingRequestHandler. It occurs to me that it would be much easier if the two were separate, so I could have direct control over Tika and just submit the text to Solr after extraction. Am I going to regret this approach? I'm not sure what ExtractingRequestHandler really does for me that Tika doesn't already do. Also, I was reading this stackoverflow entry and someone offhandedly mentioned that ExtractingRequestHandler might be separated in the future anyway. Is there a public roadmap for the project, or does one have to keep up with the developer's mailing list and hunt through JIRA entries to keep up with the pulse of the project? Thanks, Justin --001a11438b946f3a300534e257c5--