hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject pread vs. using seek()+read()
Date Sun, 11 Mar 2012 20:49:26 GMT
>From header of HFile, I see:

 * <li>The current implementation does not offer true multi-threading for
 * reading. The implementation uses FSDataInputStream seek()+read(), which
is
 * shown to be much faster than positioned-read call in single thread mode.
 * However, it also means that if multiple threads attempt to access the
same
 * HFile (using multiple scanners) simultaneously, the actual I/O is
carried out
 * sequentially even if they access different DFS blocks (Reexamine! pread
seems
 * to be 10% faster than seek+read in my testing -- stack).

>From region server log, I saw:

2012-03-09 11:49:48,296 INFO org.apache.hadoop.hdfs.DFSClient:
DFSInputStream.seek:
name=/hbase/item_active_1200/05e92f1bd4d3b1fc414a39b6b6269035/MAIN/6c55d610cd4845188de799445913806f
pos 3772475042
2012-03-09 11:49:48,296 INFO org.apache.hadoop.hdfs.DFSClient:
DFSInputStream.read:
name=/hbase/item_active_1200/05e92f1bd4d3b1fc414a39b6b6269035/MAIN/6c55d610cd4845188de799445913806f
position current,  length 212
2012-03-09 11:49:48,302 INFO org.apache.hadoop.hdfs.DFSClient:
DFSInputStream.seek:
name=/hbase/item_active_1200/05e92f1bd4d3b1fc414a39b6b6269035/MAIN/6c55d610cd4845188de799445913806f
pos 3772470307
2012-03-09 11:49:48,302 INFO org.apache.hadoop.hdfs.DFSClient:
DFSInputStream.read:
name=/hbase/item_active_1200/05e92f1bd4d3b1fc414a39b6b6269035/MAIN/6c55d610cd4845188de799445913806f
position current,  length 24

I wonder if there was recent performance comparison for scan between using
pread vs. using seek()+read().

Thanks

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message