accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: java.lang.OutOfMemoryError: GC overhead limit exceeded
Date Wed, 16 Dec 2015 15:37:42 GMT
I would need more details to break down this question:

 why is CentOS caching 21 GB


What leads you to believe this?

 Is it expected to fill all available memory?


The OS is expected to use all memory.

We have found the OS's aggressive use of disk caching swipes memory from
large processes like the tserver.  We have found tservers using more memory
than they were allotted. Some of these are ameliorated with some JVM or OS
settings, and others need deeper inspection.  It is a complex system with
competing resource needs.

And how does ACCUMULO_OTHER_OPTS helps in ingestion when I am using native
> memory maps?


Lets say you have one client and two servers.  So the client queues data
going to the two servers. Even if the two servers have infinite capacity,
the client still needs to determine which server gets which data.

Larger buffers in the client to these servers produce faster ingest rates.

If your client queues too much data to the servers and it runs out of
space. In a java client, the process runs a GC to find more space. If 90%
of all its processing is finding new space, it gives up (well, depending on
your configuration).

I was making the huge assumption that your client runs with the accumulo
scripts and it is not one of the accumulo known start points: in this case,
it is given the JVM parameters of ACCUMULO_OTHER_OPTS.

Regardless of the capacity of the tservers, your client is running into
limitations. I made a suggestion that this limitation may be due to a
single environment variable based on the users and the configurations I am
familiar with. I am often wrong.

The key point is that your client is failing due to local memory resources.
Not the tserver. I do not know if this is from configuration, or a bug. But
it is within the process that uses the accumulo API, and not the accumulo
server processes.

You need to figure out what is going on in your client, and this may find
bugs in the accumulo API.  However, we exercise the accumulo API ... um, a
lot.* So, double and triple-check your ingesters' configuration.

Accumulo is not bug free, so anomalies should be run down. Feel free to PM
me if you have additional information and cannot post publicly.

-Eric

*"A lot", is a technical term measured in scientific notation and days.


On Wed, Dec 16, 2015 at 4:10 AM, mohit.kaushik <mohit.kaushik@orkash.com>
wrote:

> Thanks Eric, but one doubt still left unclear is that when all the
> processes have there own memory limits then why is CentOS caching 21 GB. Is
> it expected to fill all available memory? And how does ACCUMULO_OTHER_OPTS
> helps in ingestion when I am using native memory maps?
>
>
> On 12/15/2015 09:21 PM, Eric Newton wrote:
>
> This is actually a client issue, and not related to the server or its
> performance.
>
> The code sending updates to the server is spending so much time in java
> GC, that it has decided to kill itself.
>
> You may want to increase the size of the JVM used for ingest, probably by
> using a larger value in ACCUMULO_OTHER_OPTS.
>
> "No Such SessionID" errors are typical of a paused client: update sessions
> time out and are forgotten. Your client ran low on memory, paused to GC,
> and the server forgot about its session.
>
> -Eric
>
> On Tue, Dec 15, 2015 at 7:45 AM, mohit.kaushik <mohit.kaushik@orkash.com>
> wrote:
>
>> Dear All,
>>
>> I am getting the below mentioned exception on Client side while inserting
>> data.
>>
>> *Exception in thread "Thrift Connection Pool Checker"
>> java.lang.OutOfMemoryError: GC overhead limit exceeded*
>> *ERROR - TabletServerBatchWriter.updateUnknownErrors(520) -  Failed to
>> send tablet server orkash1:9997 its batch : GC overhead limit exceeded*
>> *java.lang.OutOfMemoryError: GC overhead limit exceeded*
>> *ERROR - ClientCnxn$1.uncaughtException(414) -  from
>> main-SendThread(orkash2:2181)*
>> *java.lang.OutOfMemoryError: GC overhead limit exceeded*
>>
>>
>> This exception comes after few days of ingestion started. I already have
>> assigned the appropriate memory to all components. I have a 3 node cluster
>> with  Accumulo 1.7.0 and Hadoop 2.7.0 ( RAM 32 GB each). Accumulo masters,
>> namenodes runs on different servers.
>>
>> * Accumulo-env.sh*
>> ACCUMULO_TSERVER_OPTS="${POLICY} -Xmx8g -Xms3g  -XX:NewSize=500m
>> -XX:MaxNewSize=500m "
>> ACCUMULO_MASTER_OPTS="${POLICY} -Xmx1g -Xms1g"
>> ACCUMULO_MONITOR_OPTS="${POLICY} -Xmx1g -Xms256m"
>> ACCUMULO_GC_OPTS="-Xmx512m -Xms256m"
>> ACCUMULO_GENERAL_OPTS="-XX:+UseConcMarkSweepGC -XX:SurvivorRatio=3
>> -XX:CMSInitiatingOccupancyFraction=75 -Djava.net.preferIPv4Stack=true"
>>
>> * Accumulo-site.xml*
>>   <property>
>>     <name>tserver.memory.maps.max</name>
>>     <value>2G</value>
>>   </property>
>>   <property>
>>     <name>tserver.memory.maps.native.enabled</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>tserver.cache.data.size</name>
>>     <value>2G</value>
>>   </property>
>>   <property>
>>     <name>tserver.cache.index.size</name>
>>     <value>1G</value>
>>   </property>
>> <property>
>>     <name>tserver.sort.buffer.size</name>
>>     <value>500M</value>
>>   </property>
>>   <property>
>>     <name>tserver.walog.max.size</name>
>>     <value>1G</value>
>>   </property>
>>
>>
>> I found that even after setting individual memory limits servers are
>> utilizing its almost full memory(up to 21 GB cached). I am not running any
>> other application on these server only Accumulo and Hadoop is deployed.
>> why the server caching a lot of data(21 GB)
>>
>> When I scanned the logs. I found another exception in Accumulo tserver
>> logs
>>
>> * org.apache.thrift.TException: No Such SessionID*
>> *        at
>> org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:51)*
>> *        at com.sun.proxy.$Proxy20.applyUpdates(Unknown Source)*
>> *        at
>> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$applyUpdates.getResult(TabletClientService.java:2425)*
>> *        at
>> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$applyUpdates.getResult(TabletClientService.java:2411)*
>> *        at
>> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)*
>> *        at
>> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)*
>> *        at
>> org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)*
>> *        at
>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)*
>> *        at
>> org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)*
>> *        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>> *        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>> *        at
>> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)*
>> *        at java.lang.Thread.run(Thread.java:745)*
>>
>>
>>
>> Thanks & Regards
>> Mohit Kaushik
>>
>>
>
>
> --
>
> * Mohit Kaushik*
> Software Engineer
> A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India
> *Tel:* +91 (124) 4969352 | *Fax:* +91 (124) 4033553
>
> <http://politicomapper.orkash.com>interactive social intelligence at
> work...
>
> <https://www.facebook.com/Orkash2012>
> <http://www.linkedin.com/company/orkash-services-private-limited>
> <https://twitter.com/Orkash>  <http://www.orkash.com/blog/>
> <http://www.orkash.com>
> <http://www.orkash.com> ... ensuring Assurance in complexity and
> uncertainty
>
> *This message including the attachments, if any, is a confidential
> business communication. If you are not the intended recipient it may be
> unlawful for you to read, copy, distribute, disclose or otherwise use the
> information in this e-mail. If you have received it in error or are not the
> intended recipient, please destroy it and notify the sender immediately.
> Thank you *
>

Mime
View raw message