Return-Path: X-Original-To: apmail-poi-user-archive@www.apache.org Delivered-To: apmail-poi-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A50BFD6DB for ; Fri, 9 Nov 2012 22:48:03 +0000 (UTC) Received: (qmail 20921 invoked by uid 500); 9 Nov 2012 22:48:03 -0000 Delivered-To: apmail-poi-user-archive@poi.apache.org Received: (qmail 20849 invoked by uid 500); 9 Nov 2012 22:48:02 -0000 Mailing-List: contact user-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Users List" Delivered-To: mailing list user@poi.apache.org Received: (qmail 20834 invoked by uid 99); 9 Nov 2012 22:48:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2012 22:48:02 +0000 X-ASF-Spam-Status: No, hits=3.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.139.91.78] (HELO nm8.bullet.mail.sp2.yahoo.com) (98.139.91.78) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2012 22:47:54 +0000 Received: from [72.30.22.79] by nm8.bullet.mail.sp2.yahoo.com with NNFMP; 09 Nov 2012 22:47:32 -0000 Received: from [98.139.91.57] by tm13.bullet.mail.sp2.yahoo.com with NNFMP; 09 Nov 2012 22:47:32 -0000 Received: from [127.0.0.1] by omp1057.mail.sp2.yahoo.com with NNFMP; 09 Nov 2012 22:47:32 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 378211.71784.bm@omp1057.mail.sp2.yahoo.com Received: (qmail 21263 invoked by uid 60001); 9 Nov 2012 22:47:31 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1352501251; bh=mC4ux5V0HnHGFzEk5a939hkqHIerTcJL7k5fsjjQvw4=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=CRCnSz5K9pFrdENt08CUarm9ryFGxDgpyBrF69c464bcriXV7EOYe4Ob9HGeqRusw7Nygyu15UNH4Vd8NfV3lJxC/Kk9gqgbDCYR0f64caXv8dq8lDNLIV/6viZicPNZj3hPsxOG5tYkOxw7oWXAAhZK0V40Syti3X9nHpP8l3s= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=cDrnRbVyZNSurSIhfBNZlBoVnDgC3RxRp9qDD81RT1mYd3nDt/5jOXOgC/IWVI6IOYPzDLwYnNQQHADt7Xxhf3hWV3KVdRqZ+S0gRRihsLhJJ6BqxkvBR/GuyNhntUMk973XIaoS3O7Cmz5uBXQw7PTf2wf/SLPwY69s9NBeE1s=; X-YMail-OSG: Mhr6YfoVM1mV1jvpnPUBXwaQS07N9nFKjlOqkJnLZLwiF.b _rk_lcJG.CK502jscnqwIU9QMJDm0bAWSKA_9zaIxF7wDN70LSjV6fDWXpkG 2r5VUNBCY4D1ah8I90uoUiKeYq73LZuk4ZONdW4XA_derPs3DDU2WgaHXwSL 6ddaPxeNRec1xn2aQlU6oa.NBTm0RB3QmbG7SNB0zdG4N5Ttnwl.aLUieIR9 hqgvrDHi0APZkvtTIs7.wUDlMhp8zy_Mv_aF4P5aalUWrGiLaoHLYWoIjl.h U2NRyDamFOq.tEKUVYviwu80XRZ2NfjN2VX3GcQ_0c7.a4MUwyqgZ7CCBu2. 88pTQ9GUBGIPuhg3XxPQ0GKHbMGzHtU_6VyVzugPFw_PWaexxEXVAd6sofTm 1DeCoSybhezOFL440mPNulHY.elTKnlU7L4xu_XxuhpLZGjqDZxofmMaLGeh LOzoDtkZ5Yxog9CRA9_Npa7AM_LIT0_nGKj2cEupA53NWO8ufW1VEOTcgBkC XxW_MORmq5sg3v9dCiPVRp7VisXLzl8GzM3TcRAdHlYTixspx9gvbhcNXrWm rfwZga1g7u4lQ70I1I2T61_IKm5Dj_hqhw.FUNYVgzpRjfZN3nf2K6B68tlM WSfgqnp7mK_lOJjWJ3uA795NKgZlar7GqUAlhvuxxLzRcSh9hR0BUdEx5AKp Hm_B0kFcI3QIgjYc.eUoyUf0VSt6b_2U7bX1JtN4KkInRRdoaU8YkgyLLc8F q7JYYXEgAszaF_nc- Received: from [204.14.239.210] by web113502.mail.gq1.yahoo.com via HTTP; Fri, 09 Nov 2012 14:47:31 PST X-Rocket-MIMEInfo: 001.001,SSBhbSB1c2luZyBBcGFjaGUgVGlrYSB0byBleHRyYWN0IHRleHQgZnJvbSBQUFQvUFBUWCBmaWxlcy4KClRpa2EgaXMgdXNpbmcgQXBhY2hlIFBPSSB0byBleHRyYWN0IHRleHRzLgoKSSB0cmllZCB0byBjb21wYXJlIHByb2Nlc3NpbmcgdGltZSBhbmQgbWVtb3J5IHVzYWdlIGZvciBQT0kgdnMgQXNwb3NlICh3d3cuYXNwb3NlLmNvbSkKCgpUaGUgcHJvY2Vzc2luZyB0aW1lIGFuZCBtZW1vcnkgcmVxdWlyZW1lbnQgZm9yIFRpa2EgKGktZSBQT0kpIGlzIGFsbW9zdCBkb3VibGUgb2YgQXNwb3NlLgoKSXMgUG8BMAEBAQE- X-Mailer: YahooMailWebService/0.8.123.460 Message-ID: <1352501251.94136.YahooMailNeo@web113502.mail.gq1.yahoo.com> Date: Fri, 9 Nov 2012 14:47:31 -0800 (PST) From: Norman M Reply-To: Norman M Subject: Is POI really using streaming to parse files? To: "user@poi.apache.org" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="445808829-2088744274-1352501251=:94136" X-Virus-Checked: Checked by ClamAV on apache.org --445808829-2088744274-1352501251=:94136 Content-Type: text/plain; charset=us-ascii I am using Apache Tika to extract text from PPT/PPTX files. Tika is using Apache POI to extract texts. I tried to compare processing time and memory usage for POI vs Aspose (www.aspose.com) The processing time and memory requirement for Tika (i-e POI) is almost double of Aspose. Is Poi really using streaming to parse files? Why it is taking much more memory than Aspose that I thought reads the whole file into memory. I found this thread http://lucene.472066.n3.nabble.com/Large-xls-files-always-loaded-into-memory-td646710.html where Tika founder is claiming that POi is not steaming inout files. That thread is quite old, is it still the same? Any response will be appreciated. Thanks, --445808829-2088744274-1352501251=:94136--