phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samarth Jain (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-4021) Remove CachingHTableFactory
Date Fri, 14 Jul 2017 06:52:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086940#comment-16086940
] 

Samarth Jain commented on PHOENIX-4021:
---------------------------------------

[~jamestaylor] - the RegionCoprocessorEnvironment that is passed in to the setup method already
has the right config setup in there. 

One important thing to determine would be the kind of backing worker thread pool to use. The
current CachedHTableFactory shares this pool among the HTables:

{code}
this.pool = new ThreadPoolExecutor(1,
                env.getConfiguration().getInt(INDEX_WRITES_THREAD_MAX_PER_REGIONSERVER_KEY,
Integer.MAX_VALUE),
                env.getConfiguration().getInt(HTABLE_KEEP_ALIVE_KEY, 60), TimeUnit.SECONDS,
                new SynchronousQueue<Runnable>(), Threads.newDaemonThreadFactory("CachedHtables"));
        pool.allowCoreThreadTimeOut(true);
{code}
Notice the kind of worker queue we are using - SynchronousQueue. Also, we are configuring
the max number of threads as Integer.MAX and core threads to 1. 

If you still want to use the above kind of pool, you might have to tweak IndexWriterUtils#CoprocessorHConnectionTableFactory
a little bit to make sure this pool is passed in to all the HTable constructors.

For ex:

{code}
@Override
        public HTableInterface getTable(ImmutableBytesPtr tablename) throws IOException {
            return getConnection(conf).getTable(tablename.copyBytesIfNecessary(), sharedPool);
        }
{code}

If you don't specify the thread pool to use, then by default  we use the use pool returned
by HConnectionImplementation#getBatchPool(). This pool uses a LinkedBlockingQueue. And configures
number of core and max threads differently.

{code}
int maxThreads = conf.getInt("hbase.hconnection.threads.max", 256);
            int coreThreads = conf.getInt("hbase.hconnection.threads.core", 256);
            if (maxThreads == 0) {
              maxThreads = Runtime.getRuntime().availableProcessors() * 8;
            }
            if (coreThreads == 0) {
              coreThreads = Runtime.getRuntime().availableProcessors() * 8;
            }
            long keepAliveTime = conf.getLong("hbase.hconnection.threads.keepalivetime", 60);
            LinkedBlockingQueue<Runnable> workQueue =
              new LinkedBlockingQueue<Runnable>(maxThreads *
                conf.getInt(HConstants.HBASE_CLIENT_MAX_TOTAL_TASKS,
                  HConstants.DEFAULT_HBASE_CLIENT_MAX_TOTAL_TASKS));
            ThreadPoolExecutor tpe = new ThreadPoolExecutor(
                coreThreads,
                maxThreads,
                keepAliveTime,
                TimeUnit.SECONDS,
                workQueue,
                Threads.newDaemonThreadFactory(toString() + "-shared-"));
            tpe.allowCoreThreadTimeOut(true);
            this.batchPool = tpe;
{code}

We should definitely perf test which pool to use here. My hunch is that the way getBatchPool()
is configured will perform better since it configures core pool size to a more reasonable
value. Also, the executor in CachedHTableFactory ends up creating a new thread (up to the
limit of INTEGER.MAX), every time there is more work than the number of threads seems a bit
wasteful. Either way, we should perf test it.

> Remove CachingHTableFactory
> ---------------------------
>
>                 Key: PHOENIX-4021
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4021
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>              Labels: globalMutableSecondaryIndex
>             Fix For: 4.12.0
>
>         Attachments: PHOENIX-4021.patch
>
>
> CachingHTableFactory is used as a performance optimization when writing to global indexes
so that HTable instances are cached and later automatically cleaned up, rather than instantiated
each time we write to an index.
> This should be removed for two reasons:
> 1. It opens us up to race conditions, because HTables aren't threadsafe, but CachingHTableFactory
doesn't guard against two threads both grabbing the same HTable and using it simultaneously.
Since all ops going through a region share the same IndexWriter and ParallelWriterIndexCommitter,
and hence the same CachingHTableFactory, that means separate operations can both be holding
the same HTable. 
> 2. According to discussion on PHOENIX-3159, and offline discussions I've had with [~apurtell],
HBase 1.x and above make creating throwaway HTable instances cheap so the caching is no longer
needed.
> For 4.x-HBase-1.x and master, we should remove CachingHTableFactory, and for 4.x-HBase-0.98,
we should either get rid of it (if it's not too much of a perf hit) or at least make it threadsafe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message