accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Lynch <>
Subject Wikisearch Performance Question
Date Tue, 21 May 2013 17:30:42 GMT


I was working with the Wikipedia Accumulo ingest examples, and I was trying to get the ingest
of a single archive file to be as fast as ingesting multiple archives through parallelization.
I increased the number of ways the job split the single archive so that all the servers could
work on ingesting at the same time. What I noticed, however, was that having all the servers
work on ingesting the same file was still not nearly as fast as using multiple ingest files.
I was wondering if I could have some insight into the design of the Wikipedia ingest that
could explain this phenomenon.

Thank you for your time,
Patrick Lynch

View raw message