hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <nutch-...@dragonflymc.com>
Subject Out of Memory during Sorts
Date Sun, 11 Jun 2006 15:07:05 GMT
Can someone lead me in the right direction as to configuring settings 
for large sorting operations > 1M rows.  I keep getting out of memory 
exceptions during the sort phase.  Here are my current settings.  I have 
2G heap space on each box.

Dennis

<property>
  <name>io.sort.factor</name>
  <value>20</value>
  <description>
  The number of streams to merge at once while sorting
  files.  This determines the number of open file handles.
  </description>
</property>

<property>
  <name>io.sort.mb</name>
  <value>200</value>
  <description>
  The total amount of buffer memory to use while sorting
  files, in megabytes.  By default, gives each merge stream 1MB, which
  should minimize seeks.
  </description>
</property>

<property>
  <name>io.file.buffer.size</name>
  <value>8192</value>
  <description>
  The size of buffer for use in sequence files.
  The size of this buffer should probably be a multiple of hardware
  page size (4096 on Intel x86), and it determines how much data is
  buffered during read and write operations.
  </description>
</property>

<property>
  <name>io.bytes.per.checksum</name>
  <value>4096</value>
  <description>
  The number of bytes per checksum.  Must not be larger than
  io.file.buffer.size.
  </description>
</property>


Mime
View raw message