phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chenzhiming (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2209) Building Local Index Asynchronously via IndexTool fails to populate index table
Date Tue, 27 Dec 2016 09:03:58 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15779984#comment-15779984
] 

chenzhiming commented on PHOENIX-2209:
--------------------------------------

If i build local index multiple times,'select count(*) from index_table' will increase every
time.

> Building Local Index Asynchronously via IndexTool fails to populate index table
> -------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2209
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2209
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.5.0
>         Environment: CDH: 5.4.4
> HBase: 1.0.0
> Phoenix: 4.5.0 (https://github.com/SiftScience/phoenix/tree/4.5-HBase-1.0) with hacks
for CDH compatibility. 
>            Reporter: Keren Gu
>            Assignee: Rajeshbabu Chintaguntla
>              Labels: IndexTool, LocalIndex, index
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-2209.patch, PHOENIX-2209_v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Asynchronous Index population tool to create local index (of 1 column) on tables
with 10 columns, and 65M, 250M, 340M, and 1.3B rows respectively. 
> Table Schema as follows (with generic column names): 
> {quote}
> CREATE TABLE PH_SOJU_SHORT (
> id INT PRIMARY KEY,
> c2 VARCHAR NULL,
> c3 VARCHAR NULL,
> c4 VARCHAR NULL,
> c5 VARCHAR NULL,
> c6 VARCHAR NULL,
> c7 DOUBLE NULL,
> c8 VARCHAR NULL,
> c9 VARCHAR NULL,
> c10 BIGINT NULL
> )
> {quote}
> Example command used (for 65M row table): 
> {quote}
> 0: jdbc:phoenix:localhost> create local index LC_INDEX_SOJU_EVAL_FN on PH_SOJU_SHORT(C4)
async;
> {quote}
> And MR job started with command: 
> {quote}
> $ hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table PH_SOJU_SHORT --index-table
LC_INDEX_SOJU_EVAL_FN --output-path LC_INDEX_SOJU_EVAL_FN_HFILE
> {quote}
> The IndexTool MR jobs finished in 18min, 77min, 77min, and 2hr 34min respectively, but
all index tables where empty. 
> For the table with 65M rows, IndexTool had 12 mappers and reducers. MR Counters show
Map input and output records = 65M, Reduce Input and output records = 65M. PhoenixJobCounters
input and output records are all 65M. 
> IndexTool Reducer Log tail: 
> {quote}
> ...
> 2015-08-25 00:26:44,687 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last
merge-pass, with 32 segments left of total size: 22805636866 bytes
> 2015-08-25 00:26:44,693 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
File Output Committer Algorithm version is 1
> 2015-08-25 00:26:44,765 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
> 2015-08-25 00:26:44,908 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
> 2015-08-25 00:26:45,060 INFO [main] org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
> 2015-08-25 00:36:43,880 INFO [main] org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2:
Writer=hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/_temporary/attempt_1440094483400_5974_r_000000_0/0/496b926ad624438fa08626ac213d0f92,
wrote=10737418236
> 2015-08-25 00:36:45,967 INFO [main] org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
> 2015-08-25 00:38:43,095 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1440094483400_5974_r_000000_0
is done. And is in the process of committing
> 2015-08-25 00:38:43,123 INFO [main] org.apache.hadoop.mapred.Task: Task attempt_1440094483400_5974_r_000000_0
is allowed to commit now
> 2015-08-25 00:38:43,132 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
Saved output of task 'attempt_1440094483400_5974_r_000000_0' to hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/task_1440094483400_5974_r_000000
> 2015-08-25 00:38:43,158 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1440094483400_5974_r_000000_0'
done.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message