accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slater, David M." <David.Sla...@jhuapl.edu>
Subject Batchscanner and Tablet Memory
Date Fri, 15 Mar 2013 19:08:55 GMT
Hi again,

I am curious as to how Accumulo handles multiple threads in a Batchscanner, and what its ramifications
are for memory use on a node.

Let's say I start a Batchscanner with 20 threads, and scan across the entire range of rows
in a table of 80 tablets, spread across 4 nodes. Will the Batchscanner try to spin off 20
threads if possible, or will it try to match it to the number of nodes? Should I try to match
the number of threads with the number of cores that will be working on the data?

When a thread is spun off, my thinking is that the tablet that the thread is spun off on will
move the entire tablet to memory, and then the tablet will be iterated through. Is this how
it typically happens (or is there possibly multiple threads on the same tablet)? If so, do
I have to worry about memory issues if, say, one of the nodes tries to move 10 tablets into
memory, but doesn't have 20 GB of RAM left to store it?

Sorry for the vagueness of the questions, but I'm trying to understand how the general process
works under the covers, in order to diagnose some performance issues I have been having.

Thanks,
David

Mime
View raw message