Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
From: "Dan Blum" <dblum@bbn.com>
To: <user@accumulo.apache.org>
References: <393687421.762328.1472044939599.JavaMail.zimbra@scai.fraunhofer.de> <CAGUtCHr2FyCVH1bUp6fGnsQpsj2+f9Nsk65SSet94WOdcrvQzA@mail.gmail.com> <2089159696.944715.1472627172729.JavaMail.zimbra@scai.fraunhofer.de> <57D482A8.7030507@gmail.com> <CAPx=Jkb-_yi5obEPvwDA6sqb=03ywkjdxf_Q0ZWsY-FzjGCgnw@mail.gmail.com> <57D6C28C.3080407@gmail.com>
In-Reply-To: <57D6C28C.3080407@gmail.com>
Subject: RE: Accumulo Seek performance
Date: Mon, 12 Sep 2016 11:03:21 -0400
Message-ID: <002701d20d06$cdc2b460$69481d20$@bbn.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AQIlIuo0j2+zD7gLjYr6NRarIEkSXQHl6CQMAa79TgcC3wenAAI4G/qlAn4UP6mfdnZ7gA==
Content-Language: en-us
archived-at: Mon, 12 Sep 2016 15:03:37 -0000

I am not sure - my recollection is that the 1.6.x code capped the number =
of threads requested at 1 per tablet (covered by the requested ranges), =
not 1 per tablet server.

-----Original Message-----
From: Josh Elser [mailto:josh.elser@gmail.com]=20
Sent: Monday, September 12, 2016 10:58 AM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

Good call. I kind of forgot about BatchScanner threads and trying to=20
factor those in :). I guess doing one thread in the BatchScanners would=20
be more accurate.

Although, I only had one TServer, so I don't *think* there would be any=20
difference. I don't believe we have concurrent requests from one=20
BatchScanner to one TServer.

Dylan Hutchison wrote:
> Nice setup Josh.  Thank you for putting together the tests.  A few
> questions:
>
> The serial scanner implementation uses 6 threads: one for each thread =
in
> the thread pool.
> The batch scanner implementation uses 60 threads: 10 for each thread =
in
> the thread pool, since the BatchScanner was configured with 10 threads
> and there are 10 (9?) tablets.
>
> Isn't 60 threads of communication naturally inefficient?  I wonder if =
we
> would see the same performance if we set each BatchScanner to use 1 or =
2
> threads.
>
> Maybe this would motivate a /MultiTableBatchScanner/, which maintains =
a
> fixed number of threads across any number of concurrent scans, =
possibly
> to the same table.
>
>
> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser <josh.elser@gmail.com
> <mailto:josh.elser@gmail.com>> wrote:
>
>     Sven, et al:
>
>     So, it would appear that I have been able to reproduce this one
>     (better late than never, I guess...). tl;dr Serially using =
Scanners
>     to do point lookups instead of a BatchScanner is ~20x faster. This
>     sounds like a pretty serious performance issue to me.
>
>     Here's a general outline for what I did.
>
>     * Accumulo 1.8.0
>     * Created a table with 1M rows, each row with 10 columns using =
YCSB
>     (workloada)
>     * Split the table into 9 tablets
>     * Computed the set of all rows in the table
>
>     For a number of iterations:
>     * Shuffle this set of rows
>     * Choose the first N rows
>     * Construct an equivalent set of Ranges from the set of Rows,
>     choosing a random column (0-9)
>     * Partition the N rows into X collections
>     * Submit X tasks to query one partition of the N rows (to a thread
>     pool with X fixed threads)
>
>     I have two implementations of these tasks. One, where all ranges =
in
>     a partition are executed via one BatchWriter. A second where each
>     range is executed in serial using a Scanner. The numbers speak for
>     themselves.
>
>     ** BatchScanners **
>     2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : =
Shuffled
>     all rows
>     2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>     ranges calculated: 3000 ranges found
>     2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : =
Queries
>     executed in 40178 ms
>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : =
Queries
>     executed in 42296 ms
>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : =
Queries
>     executed in 46094 ms
>     2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : =
Queries
>     executed in 47704 ms
>     2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : =
Queries
>     executed in 49221 ms
>
>     ** Scanners **
>     2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : =
Shuffled
>     all rows
>     2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>     ranges calculated: 3000 ranges found
>     2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : =
Queries
>     executed in 2833 ms
>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : =
Queries
>     executed in 2536 ms
>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : =
Queries
>     executed in 2150 ms
>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : =
Queries
>     executed in 2061 ms
>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : =
Queries
>     executed in 2140 ms
>
>     Query code is available
>     https://github.com/joshelser/accumulo-range-binning
>     <https://github.com/joshelser/accumulo-range-binning>
>
>
>     Sven Hodapp wrote:
>
>         Hi Keith,
>
>         I've tried it with 1, 2 or 10 threads. Unfortunately there =
where
>         no amazing differences.
>         Maybe it's a problem with the table structure? For example it
>         may happen that one row id (e.g. a sentence) has several
>         thousand column families. Can this affect the seek =
performance?
>
>         So for my initial example it has about 3000 row ids to seek,
>         which will return about 500k entries. If I filter for specific
>         column families (e.g. a document without annotations) it will
>         return about 5k entries, but the seek time will only be =
halved.
>         Are there to much column families to seek it fast?
>
>         Thanks!
>
>         Regards,
>         Sven
>
>