pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Fwd: The Apache® Software Foundation Recognizes Apache Innovations Integral to the Pulitzer Prize-winning Panama Papers Investigation
Date Mon, 17 Apr 2017 13:04:01 GMT

-------- Weitergeleitete Nachricht --------
Betreff: The Apache® Software Foundation Recognizes Apache Innovations Integral 
to the Pulitzer Prize-winning Panama Papers Investigation
Datum: Mon, 17 Apr 2017 06:07:55 -0400
Von: Sally Khudairi <sk@apache.org>
An: Apache Announce List <announce@apache.org>

[this announcement is available online at https://s.apache.org/UkEw ]

Apache Open Source library, search, and document management tools used
in investigating the biggest leak in journalism history.

Forest Hill, MD --17 April 2017-- The Apache Software Foundation (ASF),
the all-volunteer developers, stewards, and incubators of more than 350
Open Source projects and initiatives, announced today the role played by
several Apache projects in the investigation of the Panama Papers.

At 2.6 terabytes of data, the Panama Papers is the largest leak of all
time, comprising 11.5M financial and legal records sent from an
anonymous source. The journalistic cooperation involved more than 400
journalists from 100 publications on six continents over the course of a
year. The discovery exposed a complex system of criminal and corrupt
activities secretly hidden by offshore concerns. The investigation
recently received a Pulitzer Prize in the Explanatory Reporting

"The Apache Software Foundation incorporated 18 years ago with the
mission to create software for the public good," said ASF President Sam
Ruby. "We are honored that Apache software played a critical role with
the Panama Papers, and congratulate the International Consortium of
Investigative Journalists and their media partners on this prestigious

The discovery, exchange, and management of information that involved
214,488 entities was made possible by:

Tika --toolkit that detects and extracts metadata and structured text
content from various documents. Used for document processing.

Solr --enterprise search server, based on the Lucene Java search
library, with advanced highlighting, faceted search, caching, and
replication capabilities. Used for search and indexing.

PDF Box --Open Source Java library for working with PDF documents. Used
for capturing text from PDF documents.

POI --Open Source Java library and APIs for various file formats based
on Microsoft Office. Used to extract and manipulate Excel, Word, and
PowerPoint files.

Commons --40+ projects for reusable Open Source Java components. Used to
boost cross-platform development and productivity.

In addition to Apache software, a number of other Open Source projects
were also integral to the investigation. This includes Tesseract-ocr
(whose optical character recognition engine was used for capturing text
from images), Project Blacklight (used as a discovery interface), and
Jackcess (used for reading and writing MS Access databases): three
examples of the millions of software solutions distributed under the
Apache License v2.0, that allows for their free use, modification, and

Apache Open Source Projects
Many of the ASF's 300+ projects serve as the backbone for some of the
world's most visible and widely used applications in Artificial
Intelligence and Deep Learning, Big Data, Build Management, Cloud
Computing, Content Management, DevOps, IoT and Edge Computing, Mobile,
Servers, and Web Frameworks, among other categories.

Programmers, solutions architects, individual users, educators,
researchers, corporations, governments, and enthusiasts worldwide depend
on Apache software for development tools, libraries, frameworks,
visualizers, end-user productivity solutions, and more.

75% of Apache's 150M lines of code have been developed over 65,000
person years, and are valued at US$7B. The ASF serves approximately 9M
source code downloads from Apache mirrors on a yearly basis, excluding
convenience binaries. Worldwide dependency on Apache software continues
to grow, with Web requests received from every Internet-connected
country on the planet.

The Apache Incubator is home to 63 projects undergoing development, with
emerging innovations Big Data, communication protocols, connected
devices, cryptography, data science/machine learning/analytics,
development frameworks, microfinances, remote desktop access, serverless
computing, and more.

All Apache products are available to the public-at-large completely free
of charge. All software development and project leadership is done
entirely by volunteers. As a not-for-profit charitable organization, the
ASF is funded through tax-deductible contributions from corporations,
foundations, and private individuals. Approximately 75% of the ASF's
US$1.2M annual budget is dedicated to running critical infrastructure
support services that keep Apache services running 24x7x365 at near 100%
uptime on an annual budget of less than US$5,000 per project.

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350
leading Open Source projects, including Apache HTTP Server --the world's
most popular Web server software. Through the ASF's meritocratic process
known as "The Apache Way," more than 620 individual Members and 6,000
Committers successfully collaborate to develop freely available
enterprise-grade software, benefiting millions of users worldwide:
thousands of software solutions are distributed under the Apache
License; and the community actively participates in ASF mailing lists,
mentoring initiatives, and ApacheCon, the Foundation's official user
conference, trainings, and expo. The ASF is a US 501(c)(3) charitable
organization, funded by individual donations and corporate sponsors
including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct,
Capital One, Cerner, Cloudera, Comcast, Confluent, Facebook, Google,
Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb,
Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban,
Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information,
visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Apache Commons", "PDF Box",
"Apache PDF Box", "POI", "Apache POI", "Solr", "Apache Solr", "Tika",
"Apache Tika", and "ApacheCon" are registered trademarks or trademarks
of the Apache Software Foundation in the United States and/or other
countries. All other brands and trademarks are the property of their
respective owners.

# # #

NOTE: you are receiving this message because you are subscribed to the
announce@apache.org distribution list. To unsubscribe, send email from
the recipient account to announce-unsubscribe@apache.org with the word
"Unsubscribe" in the subject line.

To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message