accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Lynch <patricklync...@aim.com>
Subject Re: Wikisearch Performance Question
Date Tue, 21 May 2013 18:47:15 GMT

There were 7 datanodes, block size was 128 MB, while the files tended to be around 500 MB
to 1 GB. The single file that was split was one of the bigger ones.



-----Original Message-----
From: Christopher <ctubbsii@apache.org>
To: user <user@accumulo.apache.org>
Sent: Tue, May 21, 2013 2:12 pm
Subject: Re: Wikisearch Performance Question


What size cluster, and what is the HDFS block size, compared to the
file sizes? I'm wondering if the blocks for the large file were
disproportionately burdening a small number of datanodes, when the
small files were more evenly distributed.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 21, 2013 at 1:30 PM, Patrick Lynch <patricklynch33@aim.com> wrote:
> user@accumulo,
>
> I was working with the Wikipedia Accumulo ingest examples, and I was trying
> to get the ingest of a single archive file to be as fast as ingesting
> multiple archives through parallelization. I increased the number of ways
> the job split the single archive so that all the servers could work on
> ingesting at the same time. What I noticed, however, was that having all the
> servers work on ingesting the same file was still not nearly as fast as
> using multiple ingest files. I was wondering if I could have some insight
> into the design of the Wikipedia ingest that could explain this phenomenon.
>
> Thank you for your time,
> Patrick Lynch

 


Mime
View raw message