hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clint Morgan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2161) getRow() is orders of magnitudes slower than get(), even on rows with one column
Date Wed, 07 Nov 2007 21:22:52 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Clint Morgan updated HADOOP-2161:
---------------------------------

    Attachment: HADOOP-2161-2.patch

Well, 2 sec read times are not acceptable for us. But looking into
this further, there appears to be a bug causing the excessive time 
(rather than the need to look everywhere for the row).

Looking at HStore.getFull (line 1103), the break from the while loop
should occur when key.compareTo(readKey) is LESS than zero. This
cuts the read times back down to 5-20 ms.

However, there still seems to be a problem:

When there are two MapFiles in the HStore, (again in getFull()) After
calling map.getClosest(), matching the key and storing the results,
the call to map.next() produces a key that is much less than the key
returned by closest. So time is wasted again iterating through all
theses keys to get back to closest, and out of the loop.

This problem can be observed by applying my patch to HStore, running
my pached performance test, and when the times start to climb tracing
through getFull(). 

I had quick look at what was going on with getClosest and next, but
did not understand all that was going on.

> getRow() is orders of magnitudes slower than get(), even on rows with one column
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-2161
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2161
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: latest from trunk
>            Reporter: Clint Morgan
>         Attachments: HADOOP-2161-2.patch, PerformanceEvaluation-patch.txt
>
>
> HTable.getRow(Text) is several orders of magnitude slower than
> HTable.get(Text, Text), even on rows with a single column.
> This problem can be observed by the attached patch of
> PerformanceEvaluation.java which changes SequentialRead to use getRow,
> and prints out the time for each read. 
> The test can the be run with:
> bin/hbase org.apache.hadoop.hbase.PerformaeEvaluation sequentialRead 1
> On my laptop, the original test (using get()) produces reads on the order of 5-20
> milliseconds. Using getRow(), the reads take 50-2000 ms. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message