Return-Path: X-Original-To: apmail-lucene-general-archive@www.apache.org Delivered-To: apmail-lucene-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D31918CC6 for ; Wed, 6 Apr 2016 08:45:25 +0000 (UTC) Received: (qmail 68247 invoked by uid 500); 6 Apr 2016 08:45:23 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 67921 invoked by uid 500); 6 Apr 2016 08:45:23 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 67900 invoked by uid 99); 6 Apr 2016 08:45:23 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Apr 2016 08:45:23 +0000 Received: from VEGA (fw1.marum.de [134.102.234.1]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 593D41A0176; Wed, 6 Apr 2016 08:45:22 +0000 (UTC) From: "Uwe Schindler" To: , Subject: Apache Solr and Tika used to index Panama Papers Date: Wed, 6 Apr 2016 10:45:19 +0200 Message-ID: <012201d18fe0$a8c1bb50$fa4531f0$@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdGP4KZaCiyylz0TTveRVvIv3JPR2A== Content-Language: de Hi all, I just wanted to repost the following by Chris Mattman on the TIKA list: If you have been following the news you=E2=80=99ve seen the Panama = papers and how the world=E2=80=99s rich and elite have been storing all = their money offshore to hide it. Two of the ASF=E2=80=99s key = technologies were used in uncovering that story and showing the world = what was going on: Apache Tika and Apache Solr. Solr was used for making the Terabytes of Panama Papers available to = journalists. The preprocessing of the documents for indexing was done = with Tika (maybe through the contrib/extraction module). Here is the article by Forbes about that: http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazo= n-encryption-epic-leak Uwe ----- Uwe Schindler uschindler@apache.org=20 ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/