Thursday, April 25, 2013

Spring Tomcat Server - Benchmark - Best Practices - Memory Elastic-EM4J- Big Data Caching

Performance – Bench Mark and Best Practices For  Spring TC server
& Elastic Memory For Java (EM4J).

These are my own opinions and not my company or employer.

 A user has deployed an application to a TC server Instance. The application works great during testing and QA.However, when the user moves the application into production, the load increases and Tomcat stops handling requests. At first this happens occasionally and for only 5 or 10 seconds per occurrence. It's such a small issue, the user might not even notice or, if noticed, may choose to just ignore the problem. After all, it's only 5 or 10 seconds and it's not happening very often. Unfortunately for the user, as the application continues to run the problem continues to occur and with a greater frequency, possibly until the Tomcat server just stops responding to requests all together.
There is a good chance that at some point in, you or someone you know has faced this issue. While there are multiple possible causes to this problem like Blocked threads, Too much load on the server, or Even Application specific problems, the one cause of this problem that I see over and over is excessive garbage collection.
As an application runs it creates objects. As it continues to run, many of these objects are no longer needed. In Java, the unused objects remain in memory until a garbage collection occurs and frees up the memory used by the objects. In most cases, these garbage collections run very quickly, but occasionally the garbage collector will need to run a “full” collection. When a full collection is run, not only does it take a considerable amount of time, but the entire JVM has to be paused while the collector runs. It is this “stop-the-world” behavior that causes TC server to fail to respond to a request.
Fortunately, there are some strategies which can be employed to mitigate the effects of garbage collections; but first, a quick discussion about performance tuning.
Performance Tuning Basics:
First rule to know is: Measure, Adjust and Measure Again. Measure the performance of the Tomcat instance before you make a change, make one change and then measure the performance of TC Server after you have made the change. If you follow this pattern, you will always know exactly how the change you made affects the performance of your TC server Instance.
Second Rule is: Run a Load Generating tools like JMeter or Selenium on the web applications deployed to  the tomcat Instance. Many Garbage Collection occur only during Load. This will help to more accurately replicate and test garbage collection issues.
Third Rule: Don’t blindly apply configuration settings to the TC server Instance. First apply rule number one as above mentioned .Because many web applications will be installed on a tc each applications has its own Memory usage pattern and  there will be an impact of crippling performance on one another.
Measuring Performance:
Prior to making any changes, How do we measure the performance? It depends on what is important to the application. For some applications, individual Response time may be important, while others will value throughput (i.e. how many requests Tomcat can process over some interval). Let’s look at something more specific to the JVM, garbage collection performance.
Garbage collection performance is a good metric to use both because it can heavily impact things like Response time and Response throughput and because it's easy to measure, even in a production system. To measure the performance of garbage collection we simply enable Garbage collection logging.
General Options
-Xms and -Xmx
These settings are used to define the size of the heap used by the JVM. -Xms defines the initial size of the heap and -Xmx defines the maximum size of the heap. Specific values for these options will depend on the number of applications and the requirements of each application deployed to a Tomcat instance.
With regard to TC Server, it is recommended that the initial and maximum values for heap size be set to the same value. This is often referred to as a fully committed heap and this will instruct the JVM to create a heap that is initially at its maximum size and prevent several full garbage collections from occurring as the heap expands to its maximum size.
-XX:PermSize and -XX:MaxPermSize
These settings are used to define the size of the permanent generation space. -XX:PermSizedefines the initial value and -XX:MaxPermSize defines the maximum value.
With regard to Tomcat, it is recommended that the initial and maximum values for the size of the permanent generation be set to the same value. This will instruct the JVM to create the permanent generation so that it is initially at its maximum size and prevent possible full garbage collections from occurring as the permanent generation expands to its maximum size.
At this point, you might be thinking that this seems awful similar to the -Xms and -Xmx options, and while the concept is the same, “PermGen” or permanent generation, refers to the location in memory where the JVM stores the class files that have been loaded into memory. This is different and distinct from the heap (specified by -Xms and -Xmx) which is where the JVM stores the object instances used by an application.
One final note, if the PermGen space becomes full (regardless of the availability of memory in the heap) then the JVM will attempt a full garbage collection to reclaim space. This can often be a source of problems for applications which dynamically create or load a large number of classes. Proper sizing of -XX:PermSize and -XX:MaxPermSize for your applications will allow you to work around this issue.
This setting determines the size of the stack for each thread in the JVM. The specific value that you should use will vary depending on the requirements of the applications deployed to Tomcat, however in most cases the default value used by the JVM is too large.
For a typical installation, this value can be lowered, saving memory and increasing the number of threads that can be run on a system. The easiest way to determine a value for your system is to start out with a very low value, for example 128k. Then run Tomcat and look for a StackOverFlow exception in the logs. If you see the exception, then gradually increase the value and restart Tomcat. When the exceptions disappear, you have found the minimal value which works for your deployment.
This setting will select the Java HotSpot Server VM. This will instruct the VM that it is running in a server environment and the default configurations will be changed accordingly.
Note, this option is really only needed when running 32-bit Windows, as 32-bit Solaris and 32-bit Linux installations with two or more CPU's and 2GB or more of RAM will enable this option by default. In addition, all 64-bit OS's have this option enabled by default as there is no 64-bit client VM.

For a comprehensive list of JVM options, please see the article Java HotSpot VM Options.
Selecting a Garbage Collector
For many users, tuning the basic options I mentioned in the previous section will be sufficient for their applications. However, for larger applications or applications which just require larger heap sizes these options may not be sufficient. If your TC Server installation fits this profile then you'll want to take one further step and tune the collector.
To begin tuning the collector, you need to pick the right collector for your application. The JVM ships with three commonly used collectors: The Serial Collector, The Parallel Collector and the Concurrent collector. In most cases when running TC Server, you'll be using either the parallel collector or the concurrent collector. The difference between the two being that the parallel collector typically offers the better throughput, while the concurrent collector often offers lower pause times.
The parallel collector can be enabled by adding -XX:+UseParallelGC to JVM_OPTS or the concurrent collector can be enabled by adding -XX:+UseConcMarkSweepGC to JVM_OPTS (you would never want to have both options enabled). As to which of the collectors you should be using, it is difficult to give a blanket recommendation. I would suggest that you give both a try, measure the results and use that to make your decision.
Once you have selected a collector, it is possible to take one further step and apply some configuration settings which are specific to the collector. That being said, most of the time the JVM will detect and set excellent values for these options. You should not attempt to manually configure these unless you have a good understanding of how the specific garbage collector is working, you are applying rule number one from above and you really know what you are doing. That said, I'm going to talk about two options, one for the parallel collector and one for the concurrent collector.
When you specify the option to run the parallel collector, it will only run on the young generation. This means that multiple threads will be used to process the young generation, but the old generation will continue to be processed by a single thread. To enable parallel compaction of the old generation space you can enable the option -XX:+UseParallelOldGC. Note that this option will help the most when enabled on a system with many processors.
When you specify the option to run the concurrent collector, it is important to realize that garbage collection will happen concurrently with the application. This means that garbage collection will consume some of the processor resources that would have otherwise been available to the application. On systems with a large number of processors, this is typically not a problem. However, if your system has only one or two processors then you will likely want to enable the -XX:+CMSIncrementalModeoption. This option enables incremental mode for the collector, which instructs the collector to periodically yield the processor back to the application and essentially prevents the collector from running for too long.

List all the Java virtual machine (JVM) options that are currently set for a single tc Runtime instance.
The command gets the currently set JVM options from the following locations:
• Unix: The JVM_OPTS variable set in the bin/ file.
• The command returns each JVM option on a single line for example:
prompt> ./tcsadmin list-jvm-options --servername="example_server"

The example gets the JVM options that are currently set for a tc Runtime instance with ID
prompt$ ./tcsadmin list-jvm-options --serverid=10045

Modify the JVM options for a tc Runtime instance or a group of tc Runtime instance.
The command sets the currently set JVM options by updating the following files on the target tc
Runtime instance or instances:
• Unix: The JVM_OPTS variable in the bin/ file.
Warning: The set-jvm-options command overwrites any existing JVM options; it does
not add to existing options. For example, if you have previously set the -Xmx512m and
-Xss192k JVM options for the tc Runtime instance, and then you execute the following
Set-jvm-options command:

prompt$ ./tcsadmin set-jvm-options --options=-Xms384m --serverid=10045

Only the -Xms384m JVM option will be set; the -Xss192k option is no longer set
The example sets the initial Java heap size (using -Xms) and the maximum Java heap size (using -Xmx) for each tc Runtime instance in the group called Group1:
prompt$ ./tcsadmin set-jvm-options --groupname=Group1 --options=-Xms512m,-Xmx1024m

How to Creating Thread Dumps and Heap Dumps For a TC Server Instance in vFabric tcServer:
Thread dump and heap dumps are necessary for troubleshooting issues in a tc Server instance. What are the tools and steps needed to create thread dump and heap dump for a tc Server instance. 
To obtain a heap dump, you must open the Tomcat start-up script (located under the bin folder) and editSUN_JVM_OPTS.

There are already pre-defined options that have been commented out that you may use. You can uncomment -XX:HeapDumpPath and -XX:-HeapDumpOnOutOfMemoryError ,which are responsible for creating heap dumps.
# JVM Sun specific settings
# For a complete list
#SUN_JVM_OPTS="-XX:MaxPermSize=192m \
# -XX:MaxGCPauseMillis=500 \

# -XX:+HeapDumpOnOutOfMemoryError"

#SUN_JVM_OPTS="-XX:MaxPermSize=192m \
# -XX:NewSize=128m \
# -XX:MaxNewSize=256m \
# -XX:MaxGCPauseMillis=500 \
# -XX:HeapDumpOnOutOfMemoryError \
# -XX:+PrintGCApplicationStoppedTime \
# -XX:+PrintGCTimeStamps \
# -XX:+PrintGCDetails
# -XX:+PrintHeapAtGC \
# -Xloggc:gc.log"

These are examples of setting parameters that you can also set for heap dumps:

  • Path to directory or filename for heap dump:

  • Dump heap to file when an OutOfMemoryError is thrown:

Enhanced Diagnostics:
SpringSource tc Server includes a full set of diagnostic features that makes it easy for you to troubleshoot any problems that might occur with tc Server or the applications that you deploy to tc Server. These diagnostic features include:
  • Deadlock detection: SpringSource tc Server automatically detects if a thread deadlock occurs in tc Server or an application deployed to tc Server.
  • Server dumps: In the event that a tc Server instance fails, the server automatically generates a snapshot of its state and dumps it to a file so that support engineers can recreate the exact state when diagnosing the problem.
  • Thread diagnostics: When you deploy and start a Web application on tc Server, and then clients begin connecting and using the application, you might find that the clients occasionally run into problems such as slow or failed requests. Although by default, tc Server logs these errors in the log files, it is often difficult to pinpoint where exactly the error came from and how to go about fixing it. By enabling thread diagnostics, tc Server provides additional information to help you troubleshoot the problem.
  • Time in Garbage Collection:: AMS has a new metric that represents the percentage of process up time (0 -100) in which tc Server has spent in garbage collection.
  • Tomcat JDBC DataSource monitoring: AMS includes a new service that represents the high-concurrency Tomcat JDBC datasources you have configured for your tc Server instance. This service monitors the health of the datasource, such as whether its connection to the database has failed or was abandoned, and whether the JDBC queries that clients execute are taking too long.
Enable Thread Diagnostics Valve in Hyperic server:
ThreadDiagnosticsValve collects diagnostic information from tc Runtime request threads. If the thread has JDBC activity on a DataSource, the collected diagnostics can include the JDBC query, depending on how you configure ThreadDiagnosticsValve. The collected information is exposed through JMX MBeans.Hyperic Server, via the tc Server plug-in, uses ThreadDiagnosticsValve to enable and access thread diagnostics.
The diagnostics collected for a thread include the following:
  • The URI of the request
  • The query portion of the request string
  • Time the request began
  • Time the request completed
  • Total duration of the request
  • The number of garbage collections that occurred during the request
  • The time spent in garbage collection
  • Number of successful connection requests
  • Number of failed connection requests
  • Time spent waiting for connections
  • Text of each query executed
  • Execution time for each query
  • Status of each query
  • Execution time for all queries
  • Stack traces for failed queries
Setting up Thread Diagnostics Valve:
Set up ThreadDiagnosticsValve by adding a Valve child element to the Engine or Host element in conf/server.xml and configuring a DataSource, if you want JDBC diagnostics.
If you include the diagnostics template in the tcruntime-instance create command, the configuration is done for you, including creating a DataSource whose activity will be included in the diagnostics. For example:
$ ./ create –t diagnostics my instance
When you create a tc Runtime instance using the diagnostics template, the following Valve element is inserted as a child of the Engine element in the conf/server.xml file of the new instance.

className =”com.springsource.tcserver.serviceability.request.ThreadDiagnosticsValve”
loggingInterval =”10000”
notificationInterval =”60000”
Threshold=”10000” />

Elastic Java Memory Balloning(EM4J):
Elastic Memory for Java describes how to set up, monitor, and manage Elastic Memory for Java (EM4J), the memory management technology that improves memory utilization when executing Java workloads on VMware ESXi virtual machines.
Elastic Memory for Java (EM4J) manages a memory balloon that sits directly in the Java heap and works with new memory reclamation capabilities introduced in ESXi 5.0. EM4J works with the Hypervisor to communicate system-wide memory pressure directly into the Java heap, forcing Java to clean up proactively and return memory at the most appropriate times—when it is least active. And you no longer have to give Java 100% of the memory that it needs.
EM4J is an Add-on to vFabric tc Server. With EM4J and tc Server, you can run more Java applications on your ESXi servers with predictable performance, squeezing the most value out of your hardware investment.
ESXi ensures the efficient use of physical memory by employing shared memory pages, ballooning, memory compression and, as a last resort, disk swapping.
Balloon Driver: - A far more efficient mechanism of reclaiming memory from a virtual is the VMware Tools balloon driver, which runs as a process within the virtual machine, allocates and pins unused memory, and communicates the pages back to the hypervisor. The memory owned by the balloon driver can then be temporarily de-coupled from the virtual machine and used elsewhere.
Under the control of the ESXi hypervisor, balloons in each host virtual machine expand or shrink depending on the shifting requirements of the virtual machines. A less active virtual machine gets a higher balloon target and the reclaimed memory moves to the more active virtual machines.
How EM4J Affects Memory and Application Performance:-
EM4J is tuned to work with long-running Web applications, where the application serves client requests and response times are the critical performance metric. If EM4J is enabled and the host is not over- committed, there is no cost to running EM4J.As the host memory pressure increases, response times

may increase gradually due to an increase in GC frequency. Balloons inflate first in the least active virtual machines where the increased GC is less likely to be disruptive.
When you enable EM4J and begin to over-commit memory, client response times should be similar to running your application with fully reserved memory, the difference imperceptible to your users
EM4J helps the system behave gracefully and predictably when memory becomes scarce. It helps you to more easily determine the over-commit ratio that provides acceptable performance at peak loads.

Create and Start an EM4J-Enabled tc Runtime Instance:

You create an EM4J-enabled tc Runtime instance with the tcruntime-instance command, specifying the Elastic Memory Template. This is no different than using any other tc Server template, but note these caveats:

• The elastic-memory template is only useful in an ESXi virtual machine with a supported guest OS and JVM, where it is part of the memory management strategy for the virtualization environment.
• The path to the tc Runtime instance (CATALINA_BASE) must not contain spaces.

• Enable EM4J.Enable EM4J in the VM.
• Install supported guest operating systems and a JDK or JVM on the VMs. See Platform Support.
• Install VMware Tools on the guest operating systems. See Installing and Upgrading VMware Tools.
• Install tc Server Standard Edition 2.6. This tc Server release includes the elastic-memory template,     which you use to create EM4J-enabled tc Server instances.


1. Change to the tc Server installation directory and create an EM4J-enabled tc Runtime instance.

For example:

prompt$ ./ create instanceName -t elastic-memory

Replace instanceName with the name of the runtime instance. You can use additional templates (-t option) or use any of the other features of the tcruntime-instance command described in "

2. Start the instance using the command.

For example:
prompt$ ./ instanceName start

3. To verify that EM4J started successfully, look in CATALINA_HOME/logs/catalina.out for the message EM4J 1.0.x agent initialized.

The Benefits

This really creates an elastic memory environment for Java, enabling the JVMs to grow on demand for ultimate performance and companies to achieve maximum consolidation on their data centers, saving memory resources, space, energy and of course money.

Based on this resource, companies are able to allocate more JVMs per host, increasing their consolidation ratio also with applications running on a Java Virtual Machine.  You should also note the more virtualized the environment is, the higher escalability it can achieve, as more VMs can balloon at the same time and quickly provide extra room for peak usage.

As another benefit, this would also enable us to scale the JVM's heap to points we couldn't achieve before it and make the hypervisor priorizes the GC on the JVMs which are less loaded at each moment. 

Tests were done on up to 40% over-commit on JVMs and have shown great success. 
Ballooning can be seen happening as regular balooning at the hypervisor level and monitored through the VSphere performance monitoring charts as below:


EM4J is currently available for vSphere 5 and compatible with Hotspot 1.6 JVMs running TC Server. It is distributed as part of the VMWare's vFabric bundle which includes TC Server, ERS, Hyperic Enterprise, GemFire Application Cache Node, RabbitMQ and SQLFire (those last two available only for vFabric Advanced bundle)

No comments: