lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <wangzhijiang...@aliyun.com>
Subject 回复:RE: why lucene not use DirectByteBuffer in NIOFSDirectory
Date Wed, 07 Aug 2013 09:35:47 GMT

Hi Uwe:       Thank you for your detail explaination and I learnt a lot from your message.      
First, the direct buffer and the FS cache do not share the same memory areas.      
Second, accessing direct memory is slower than accessing heap memory from java code.    
  In addition, I tested the different ways to use bytebuffer in java NIO and watch the memory.
The results like below:
 
      The initial memory status of linux server , and after each test, i will clear the
system cache.
 

  
     RandomAccessFile raf=new RandomAccessFile(file,"r");  // file size is about 2G 
   FileChannel channel=raf.getChannel();      int len=(int)channel.size();
 
1.  ByteBuffer bb=ByteBuffer.wrap(new byte[len]);  //bb is HeapByteBuffer     channel.read(bb);
            
    
      It consumes 4G physical memory except FS cache, and the heap memory is 2G consumed
by new byte, the other 2G is direct memory outside heap. It confused me that why it would
use DirectByteBuffer inside or I made some mistakes? If it uses DirectByteBuffer, then the
data will copy from FS cache to direct buffer, from direct buffer to byte array in heap memory,
in twice copy?
 
2. ByteBuffer bb=ByteBuffer.allocate(len); //bb is HeapByteBuffer    channel.read(bb);
         
      The memory is the same as above.
 
3. ByteBuffer bb=ByteBuffer.allocateDirect(len); //bb is DirectByteBuffer    channel.read(bb);     
       It consumes 2G physical memory except FS cache, and the whole is direct memory
outside heap. The heap memory is less than 2M.
 
4.   MappedByteBuffer bb=channel.map(FileChannel.MapMode.READ_ONLY,0,len);       bb.get(new
byte[len]);
       
    
      It consumes 4G memory from top RES, and the heap memory is 2G consumed by new byte.
I am confusing of what is the other 2G. From buffers/cache used: the code should consume 2326
memory, why it is not the same as top RES?
 
     All the above tests are not comparing the performance between different ways. In
lucene NIO, the size of buffer is 1024, not the whole file size as I tested. And the mmap
in lucene, it uses  ByteBuffer.get(), getInt() to fetch the data, do not need copy data to
new byte array in heap memory as I tested.
 
     Wish somebody giving me some explainations about my two confusings from above tests.
 
     Thank you again!


------------------------------------------------------------------发件人:Uwe Schindler<uwe@thetaphi.de>发送日期:2013年7月31日
18:18收件人:java-user@lucene.apache.org;wangzhijiang999@yahoo.com.cn;主 题:RE:
why lucene not use DirectByteBuffer in NIOFSDirectory Hi,There is a misunderstanding: Just
by allocating a direct buffer, there is still no difference to a heap buffer in the workflow!NIO
will read the data from file, copy it to FS cache and then the positional read() system call
(used by NIO) copies the FS cache contents to the direct buffer, no real difference to a heap
buffer. So it is still the same: data needs to be copied. Please note: Direct buffers have
nothing to do with file system cache, they don't share the same memory areas! Hotspot allocates
direct buffers using malloc() outside of Java heap, so not really useful here.The backside
of using a non-heap buffer as target for the copy operation is the fact, that direct buffers
are approx. 2 times slower when accessed from Java code (because they are outside java heap,
the VM has to do extra work to prevent illegal accesses: so you have the same time for copy
but slower access from Java. The buffers allocated by NIO are small so it does not matter
for performance where they are. So heap is better. MappedByteBuffers are also direct buffers
(they have the same base class), so there is still the overhead when accessing them from Java
code, but to get rid of the additional copy, use MMapDirectory.To conclude:- MMapDirectory:
No copying of the data from FS-cache to heap or direct buffers needed, which wastes most of
the time. Access times to MappedByteBuffer from Java code is slower, but the spared data copy
makes it much better for large files as used by Lucene.- NIOFSDirectory with direct buffers:
Needs to copy data from FS cache to direct buffer memory (outside heap). Access times slower
to direct buffers than to heap buffers -> 2 times bad- NIOFSDirectory with heap buffers:
Needs to copy data from FS cache to heap. Access time from java code is very good!Uwe-----Uwe
SchindlerH.-H.-Meier-Allee 63, D-28213 Bremenhttp://www.thetaphi.deeMail: uwe@thetaphi.de>
-----Original Message-----> From: wangzhijiang999@yahoo.com.cn> [mailto:wangzhijiang999@yahoo.com.cn]>
Sent: Wednesday, July 31, 2013 11:59 AM> To: java-user@lucene.apache.org> Subject: why
lucene not use DirectByteBuffer in NIOFSDirectory> > I read this article "Use Lucene's
MMapDirectory on 64bit platforms, please!"> and it said the MMapDirectory is better than
other Directory because it will> void copy data between file system cache and java heap.>
> I checked the source code of NIOFSDirectory, and in new Buffer method it> called "ByteBuffer.wrap(newBuffer)",
the generated ByteBuffer is> HeapByteBuffer. And it will indeed copy data between file
system cache and> java heap. Why not use ByteBuffer.allocateDirect to generate> DirectyByteBuffer,
and it will store data directly in file sysytem cache, not> java heap. If in this case
, what is the different in performance between NIO> and MMap? Or allocate directy memory
in still slower than Mmap?> > Maybe I made some misunderstanding of lucene code, thank
you for any> suggestion in advance.> > --------------------------------------------------------------------->
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org> For additional commands,
e-mail: java-user-help@lucene.apache.org---------------------------------------------------------------------To
unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.orgFor additional commands, e-mail:
java-user-help@lucene.apache.org
Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message