Thursday, April 25, 2013

Spring Tomcat Server - Benchmark - Best Practices - Memory Elastic-EM4J- Big Data Caching

Performance – Bench Mark and Best Practices For  Spring TC server
& Elastic Memory For Java (EM4J).

These are my own opinions and not my company or employer.

 A user has deployed an application to a TC server Instance. The application works great during testing and QA.However, when the user moves the application into production, the load increases and Tomcat stops handling requests. At first this happens occasionally and for only 5 or 10 seconds per occurrence. It's such a small issue, the user might not even notice or, if noticed, may choose to just ignore the problem. After all, it's only 5 or 10 seconds and it's not happening very often. Unfortunately for the user, as the application continues to run the problem continues to occur and with a greater frequency, possibly until the Tomcat server just stops responding to requests all together.
There is a good chance that at some point in, you or someone you know has faced this issue. While there are multiple possible causes to this problem like Blocked threads, Too much load on the server, or Even Application specific problems, the one cause of this problem that I see over and over is excessive garbage collection.
As an application runs it creates objects. As it continues to run, many of these objects are no longer needed. In Java, the unused objects remain in memory until a garbage collection occurs and frees up the memory used by the objects. In most cases, these garbage collections run very quickly, but occasionally the garbage collector will need to run a “full” collection. When a full collection is run, not only does it take a considerable amount of time, but the entire JVM has to be paused while the collector runs. It is this “stop-the-world” behavior that causes TC server to fail to respond to a request.
Fortunately, there are some strategies which can be employed to mitigate the effects of garbage collections; but first, a quick discussion about performance tuning.
Performance Tuning Basics:
First rule to know is: Measure, Adjust and Measure Again. Measure the performance of the Tomcat instance before you make a change, make one change and then measure the performance of TC Server after you have made the change. If you follow this pattern, you will always know exactly how the change you made affects the performance of your TC server Instance.
Second Rule is: Run a Load Generating tools like JMeter or Selenium on the web applications deployed to  the tomcat Instance. Many Garbage Collection occur only during Load. This will help to more accurately replicate and test garbage collection issues.
Third Rule: Don’t blindly apply configuration settings to the TC server Instance. First apply rule number one as above mentioned .Because many web applications will be installed on a tc each applications has its own Memory usage pattern and  there will be an impact of crippling performance on one another.
Measuring Performance:
Prior to making any changes, How do we measure the performance? It depends on what is important to the application. For some applications, individual Response time may be important, while others will value throughput (i.e. how many requests Tomcat can process over some interval). Let’s look at something more specific to the JVM, garbage collection performance.
Garbage collection performance is a good metric to use both because it can heavily impact things like Response time and Response throughput and because it's easy to measure, even in a production system. To measure the performance of garbage collection we simply enable Garbage collection logging.
General Options
-Xms and -Xmx
These settings are used to define the size of the heap used by the JVM. -Xms defines the initial size of the heap and -Xmx defines the maximum size of the heap. Specific values for these options will depend on the number of applications and the requirements of each application deployed to a Tomcat instance.
With regard to TC Server, it is recommended that the initial and maximum values for heap size be set to the same value. This is often referred to as a fully committed heap and this will instruct the JVM to create a heap that is initially at its maximum size and prevent several full garbage collections from occurring as the heap expands to its maximum size.
-XX:PermSize and -XX:MaxPermSize
These settings are used to define the size of the permanent generation space. -XX:PermSizedefines the initial value and -XX:MaxPermSize defines the maximum value.
With regard to Tomcat, it is recommended that the initial and maximum values for the size of the permanent generation be set to the same value. This will instruct the JVM to create the permanent generation so that it is initially at its maximum size and prevent possible full garbage collections from occurring as the permanent generation expands to its maximum size.
At this point, you might be thinking that this seems awful similar to the -Xms and -Xmx options, and while the concept is the same, “PermGen” or permanent generation, refers to the location in memory where the JVM stores the class files that have been loaded into memory. This is different and distinct from the heap (specified by -Xms and -Xmx) which is where the JVM stores the object instances used by an application.
One final note, if the PermGen space becomes full (regardless of the availability of memory in the heap) then the JVM will attempt a full garbage collection to reclaim space. This can often be a source of problems for applications which dynamically create or load a large number of classes. Proper sizing of -XX:PermSize and -XX:MaxPermSize for your applications will allow you to work around this issue.
This setting determines the size of the stack for each thread in the JVM. The specific value that you should use will vary depending on the requirements of the applications deployed to Tomcat, however in most cases the default value used by the JVM is too large.
For a typical installation, this value can be lowered, saving memory and increasing the number of threads that can be run on a system. The easiest way to determine a value for your system is to start out with a very low value, for example 128k. Then run Tomcat and look for a StackOverFlow exception in the logs. If you see the exception, then gradually increase the value and restart Tomcat. When the exceptions disappear, you have found the minimal value which works for your deployment.
This setting will select the Java HotSpot Server VM. This will instruct the VM that it is running in a server environment and the default configurations will be changed accordingly.
Note, this option is really only needed when running 32-bit Windows, as 32-bit Solaris and 32-bit Linux installations with two or more CPU's and 2GB or more of RAM will enable this option by default. In addition, all 64-bit OS's have this option enabled by default as there is no 64-bit client VM.

For a comprehensive list of JVM options, please see the article Java HotSpot VM Options.
Selecting a Garbage Collector
For many users, tuning the basic options I mentioned in the previous section will be sufficient for their applications. However, for larger applications or applications which just require larger heap sizes these options may not be sufficient. If your TC Server installation fits this profile then you'll want to take one further step and tune the collector.
To begin tuning the collector, you need to pick the right collector for your application. The JVM ships with three commonly used collectors: The Serial Collector, The Parallel Collector and the Concurrent collector. In most cases when running TC Server, you'll be using either the parallel collector or the concurrent collector. The difference between the two being that the parallel collector typically offers the better throughput, while the concurrent collector often offers lower pause times.
The parallel collector can be enabled by adding -XX:+UseParallelGC to JVM_OPTS or the concurrent collector can be enabled by adding -XX:+UseConcMarkSweepGC to JVM_OPTS (you would never want to have both options enabled). As to which of the collectors you should be using, it is difficult to give a blanket recommendation. I would suggest that you give both a try, measure the results and use that to make your decision.
Once you have selected a collector, it is possible to take one further step and apply some configuration settings which are specific to the collector. That being said, most of the time the JVM will detect and set excellent values for these options. You should not attempt to manually configure these unless you have a good understanding of how the specific garbage collector is working, you are applying rule number one from above and you really know what you are doing. That said, I'm going to talk about two options, one for the parallel collector and one for the concurrent collector.
When you specify the option to run the parallel collector, it will only run on the young generation. This means that multiple threads will be used to process the young generation, but the old generation will continue to be processed by a single thread. To enable parallel compaction of the old generation space you can enable the option -XX:+UseParallelOldGC. Note that this option will help the most when enabled on a system with many processors.
When you specify the option to run the concurrent collector, it is important to realize that garbage collection will happen concurrently with the application. This means that garbage collection will consume some of the processor resources that would have otherwise been available to the application. On systems with a large number of processors, this is typically not a problem. However, if your system has only one or two processors then you will likely want to enable the -XX:+CMSIncrementalModeoption. This option enables incremental mode for the collector, which instructs the collector to periodically yield the processor back to the application and essentially prevents the collector from running for too long.

List all the Java virtual machine (JVM) options that are currently set for a single tc Runtime instance.
The command gets the currently set JVM options from the following locations:
• Unix: The JVM_OPTS variable set in the bin/ file.
• The command returns each JVM option on a single line for example:
prompt> ./tcsadmin list-jvm-options --servername="example_server"

The example gets the JVM options that are currently set for a tc Runtime instance with ID
prompt$ ./tcsadmin list-jvm-options --serverid=10045

Modify the JVM options for a tc Runtime instance or a group of tc Runtime instance.
The command sets the currently set JVM options by updating the following files on the target tc
Runtime instance or instances:
• Unix: The JVM_OPTS variable in the bin/ file.
Warning: The set-jvm-options command overwrites any existing JVM options; it does
not add to existing options. For example, if you have previously set the -Xmx512m and
-Xss192k JVM options for the tc Runtime instance, and then you execute the following
Set-jvm-options command:

prompt$ ./tcsadmin set-jvm-options --options=-Xms384m --serverid=10045

Only the -Xms384m JVM option will be set; the -Xss192k option is no longer set
The example sets the initial Java heap size (using -Xms) and the maximum Java heap size (using -Xmx) for each tc Runtime instance in the group called Group1:
prompt$ ./tcsadmin set-jvm-options --groupname=Group1 --options=-Xms512m,-Xmx1024m

How to Creating Thread Dumps and Heap Dumps For a TC Server Instance in vFabric tcServer:
Thread dump and heap dumps are necessary for troubleshooting issues in a tc Server instance. What are the tools and steps needed to create thread dump and heap dump for a tc Server instance. 
To obtain a heap dump, you must open the Tomcat start-up script (located under the bin folder) and editSUN_JVM_OPTS.

There are already pre-defined options that have been commented out that you may use. You can uncomment -XX:HeapDumpPath and -XX:-HeapDumpOnOutOfMemoryError ,which are responsible for creating heap dumps.
# JVM Sun specific settings
# For a complete list
#SUN_JVM_OPTS="-XX:MaxPermSize=192m \
# -XX:MaxGCPauseMillis=500 \

# -XX:+HeapDumpOnOutOfMemoryError"

#SUN_JVM_OPTS="-XX:MaxPermSize=192m \
# -XX:NewSize=128m \
# -XX:MaxNewSize=256m \
# -XX:MaxGCPauseMillis=500 \
# -XX:HeapDumpOnOutOfMemoryError \
# -XX:+PrintGCApplicationStoppedTime \
# -XX:+PrintGCTimeStamps \
# -XX:+PrintGCDetails
# -XX:+PrintHeapAtGC \
# -Xloggc:gc.log"

These are examples of setting parameters that you can also set for heap dumps:

  • Path to directory or filename for heap dump:

  • Dump heap to file when an OutOfMemoryError is thrown:

Enhanced Diagnostics:
SpringSource tc Server includes a full set of diagnostic features that makes it easy for you to troubleshoot any problems that might occur with tc Server or the applications that you deploy to tc Server. These diagnostic features include:
  • Deadlock detection: SpringSource tc Server automatically detects if a thread deadlock occurs in tc Server or an application deployed to tc Server.
  • Server dumps: In the event that a tc Server instance fails, the server automatically generates a snapshot of its state and dumps it to a file so that support engineers can recreate the exact state when diagnosing the problem.
  • Thread diagnostics: When you deploy and start a Web application on tc Server, and then clients begin connecting and using the application, you might find that the clients occasionally run into problems such as slow or failed requests. Although by default, tc Server logs these errors in the log files, it is often difficult to pinpoint where exactly the error came from and how to go about fixing it. By enabling thread diagnostics, tc Server provides additional information to help you troubleshoot the problem.
  • Time in Garbage Collection:: AMS has a new metric that represents the percentage of process up time (0 -100) in which tc Server has spent in garbage collection.
  • Tomcat JDBC DataSource monitoring: AMS includes a new service that represents the high-concurrency Tomcat JDBC datasources you have configured for your tc Server instance. This service monitors the health of the datasource, such as whether its connection to the database has failed or was abandoned, and whether the JDBC queries that clients execute are taking too long.
Enable Thread Diagnostics Valve in Hyperic server:
ThreadDiagnosticsValve collects diagnostic information from tc Runtime request threads. If the thread has JDBC activity on a DataSource, the collected diagnostics can include the JDBC query, depending on how you configure ThreadDiagnosticsValve. The collected information is exposed through JMX MBeans.Hyperic Server, via the tc Server plug-in, uses ThreadDiagnosticsValve to enable and access thread diagnostics.
The diagnostics collected for a thread include the following:
  • The URI of the request
  • The query portion of the request string
  • Time the request began
  • Time the request completed
  • Total duration of the request
  • The number of garbage collections that occurred during the request
  • The time spent in garbage collection
  • Number of successful connection requests
  • Number of failed connection requests
  • Time spent waiting for connections
  • Text of each query executed
  • Execution time for each query
  • Status of each query
  • Execution time for all queries
  • Stack traces for failed queries
Setting up Thread Diagnostics Valve:
Set up ThreadDiagnosticsValve by adding a Valve child element to the Engine or Host element in conf/server.xml and configuring a DataSource, if you want JDBC diagnostics.
If you include the diagnostics template in the tcruntime-instance create command, the configuration is done for you, including creating a DataSource whose activity will be included in the diagnostics. For example:
$ ./ create –t diagnostics my instance
When you create a tc Runtime instance using the diagnostics template, the following Valve element is inserted as a child of the Engine element in the conf/server.xml file of the new instance.

className =”com.springsource.tcserver.serviceability.request.ThreadDiagnosticsValve”
loggingInterval =”10000”
notificationInterval =”60000”
Threshold=”10000” />

Elastic Java Memory Balloning(EM4J):
Elastic Memory for Java describes how to set up, monitor, and manage Elastic Memory for Java (EM4J), the memory management technology that improves memory utilization when executing Java workloads on VMware ESXi virtual machines.
Elastic Memory for Java (EM4J) manages a memory balloon that sits directly in the Java heap and works with new memory reclamation capabilities introduced in ESXi 5.0. EM4J works with the Hypervisor to communicate system-wide memory pressure directly into the Java heap, forcing Java to clean up proactively and return memory at the most appropriate times—when it is least active. And you no longer have to give Java 100% of the memory that it needs.
EM4J is an Add-on to vFabric tc Server. With EM4J and tc Server, you can run more Java applications on your ESXi servers with predictable performance, squeezing the most value out of your hardware investment.
ESXi ensures the efficient use of physical memory by employing shared memory pages, ballooning, memory compression and, as a last resort, disk swapping.
Balloon Driver: - A far more efficient mechanism of reclaiming memory from a virtual is the VMware Tools balloon driver, which runs as a process within the virtual machine, allocates and pins unused memory, and communicates the pages back to the hypervisor. The memory owned by the balloon driver can then be temporarily de-coupled from the virtual machine and used elsewhere.
Under the control of the ESXi hypervisor, balloons in each host virtual machine expand or shrink depending on the shifting requirements of the virtual machines. A less active virtual machine gets a higher balloon target and the reclaimed memory moves to the more active virtual machines.
How EM4J Affects Memory and Application Performance:-
EM4J is tuned to work with long-running Web applications, where the application serves client requests and response times are the critical performance metric. If EM4J is enabled and the host is not over- committed, there is no cost to running EM4J.As the host memory pressure increases, response times

may increase gradually due to an increase in GC frequency. Balloons inflate first in the least active virtual machines where the increased GC is less likely to be disruptive.
When you enable EM4J and begin to over-commit memory, client response times should be similar to running your application with fully reserved memory, the difference imperceptible to your users
EM4J helps the system behave gracefully and predictably when memory becomes scarce. It helps you to more easily determine the over-commit ratio that provides acceptable performance at peak loads.

Create and Start an EM4J-Enabled tc Runtime Instance:

You create an EM4J-enabled tc Runtime instance with the tcruntime-instance command, specifying the Elastic Memory Template. This is no different than using any other tc Server template, but note these caveats:

• The elastic-memory template is only useful in an ESXi virtual machine with a supported guest OS and JVM, where it is part of the memory management strategy for the virtualization environment.
• The path to the tc Runtime instance (CATALINA_BASE) must not contain spaces.

• Enable EM4J.Enable EM4J in the VM.
• Install supported guest operating systems and a JDK or JVM on the VMs. See Platform Support.
• Install VMware Tools on the guest operating systems. See Installing and Upgrading VMware Tools.
• Install tc Server Standard Edition 2.6. This tc Server release includes the elastic-memory template,     which you use to create EM4J-enabled tc Server instances.


1. Change to the tc Server installation directory and create an EM4J-enabled tc Runtime instance.

For example:

prompt$ ./ create instanceName -t elastic-memory

Replace instanceName with the name of the runtime instance. You can use additional templates (-t option) or use any of the other features of the tcruntime-instance command described in "

2. Start the instance using the command.

For example:
prompt$ ./ instanceName start

3. To verify that EM4J started successfully, look in CATALINA_HOME/logs/catalina.out for the message EM4J 1.0.x agent initialized.

The Benefits

This really creates an elastic memory environment for Java, enabling the JVMs to grow on demand for ultimate performance and companies to achieve maximum consolidation on their data centers, saving memory resources, space, energy and of course money.

Based on this resource, companies are able to allocate more JVMs per host, increasing their consolidation ratio also with applications running on a Java Virtual Machine.  You should also note the more virtualized the environment is, the higher escalability it can achieve, as more VMs can balloon at the same time and quickly provide extra room for peak usage.

As another benefit, this would also enable us to scale the JVM's heap to points we couldn't achieve before it and make the hypervisor priorizes the GC on the JVMs which are less loaded at each moment. 

Tests were done on up to 40% over-commit on JVMs and have shown great success. 
Ballooning can be seen happening as regular balooning at the hypervisor level and monitored through the VSphere performance monitoring charts as below:


EM4J is currently available for vSphere 5 and compatible with Hotspot 1.6 JVMs running TC Server. It is distributed as part of the VMWare's vFabric bundle which includes TC Server, ERS, Hyperic Enterprise, GemFire Application Cache Node, RabbitMQ and SQLFire (those last two available only for vFabric Advanced bundle)

Wednesday, April 24, 2013

Liferay Portal Best Practices - Improve Performance

Life Ray Portal Best Practices:
As an infrastructure portal, Liferay Portal can support over 3300 concurrent users on a single server with mean login times under a second and maximum ½ throughput of 79+ logins per second.
In collaboration and social networking scenarios, each physical server supports over 1300 concurrent users at average transaction times of under 800ms.
Liferay Portal’s WCM scales to beyond 150,000 concurrent users on a single Liferay Portal server with average transaction times under 50ms and 35% CPU utilization.
Given sufficient database resources and efficient load balancing, Liferay Portal can scale linearly as one adds additional servers to a cluster.

1. Adjust the server's thread pool and JDBC connection pool.
Ø  Unfortunately, there's no magic number for this. It must be tuned based on usage.
Ø  By default, Liferay is configured for a maximum of 100 database connections.
Ø  For Tomcat, a good number is between 200 and 400 threads in the thread pool.
Ø  YMMV: use a profiler and tune to the right number.

2. Turn off unused servlet filters.
Ø  Servlet filters were introduced in the servlet specification 2.3.
Ø  They dynamically intercept requests and transform them in some way.
Ø  Liferay contains 17 servlet filters.
Ø  Chances are, you don't need them all, so turn off the ones you aren't using!

Servlet Filters to Turn Off
SSO CAS Filter: Are you using CAS for Single Sign-On? If not, you don't need this filter running.
SSO NTLM Filter: Are your users authenticating via NTLM (NT LAN Manager)? If not, you don't need this filter running.
SSO OpenSSO Filter: Are you using OpenSSO for Single Sign-On? If not, you don't need this filter running.
Virtual Host Filter: Are you mapping domain names to communities or organizations? If not, turn this filter off.
Sharepoint Filter: Are you using Liferay's Sharepoint functionality for saving documents directly to the portal? If not, this filter is not for you. Turn it off.

How to Turn off a servlet filter?
Easy! Comment it out of the web.xml file:
3. Tune your JVM parameters.

Again, there is nothing set in stone for this: you will have to go through the cycle of tune and profile, tune and profile until you get the parameters right.

When Garbage Collection occurs, here's what happens:
When all of this is done, the space is compacted, so the memory is contiguous.
Ø  By default, the JDK uses a serial garbage collector.
Ø  When it runs, the garbage collector stops all application execution in order to do its job.
Ø  This works really well for desktop-based, client applications which are running on one processor.
Ø  For server-based, multi processor systems, you will perhaps want to switch to the parallel garbage collector known as the Concurrent Mark-Sweep collector (CMS).
Ø  This collector makes one short pause in application execution to mark objects directly reachable from the application code.
Ø  Then it allows the application to run while it marks all objects which are reachable from the set it marked.
Ø  Finally, it adds another phase called the remark phase which finalizes marking by revisiting any objects modified while the application was running.
Ø  It then sweeps through and garbage collects.

JVM Options

NewSize, MaxNewSize: The initial size and the maximum size of the New or Young Generation.
+UseParNewGC: Causes garbage collection to happen in parallel, using multiple CPUs. This decreases garbage collection overhead and increases application throughput.
+UseConcMarkSweepGC: Use the Concurrent Mark-Sweep Garbage Collector. This uses shorter garbage collection pauses, and is good for applications that have a relatively large set of long-lived data, and that run on machines with two or more processors, such as web servers.
+CMSParallelRemarkEnabled: For the CMS GC, enables the garbage collector to use multiple threads during the CMS remark phase. This decreases the pauses during this phase.
ServivorRatio: Controls the size of the two survivor spaces. It's a ratio between the survivor space size and Eden. The default is 25. There's not much bang for the buck here, but it may need to be adjusted.
ParallelGCThreads: The number of threads to use for parallel garbage collection.
Should be equal to the number of CPU cores in your server.
Example Java Options String

-XX:MaxNewSize=700m -Xms2048m -Xmx2048m
-XX:MaxPermSize=128m -XX:+UseParNewGC -XX:
+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=20 -XX:ParallelGCThreads=8"

4. Tune ehcache.

Ø  Liferay uses ehcache, which is a cluster-aware, tunable cache.
Ø  Caching greatly speeds up performance by reducing the number of times the application has to go grab something from the database.
Ø  Liferay's cache comes tuned to default settings, but you may want to modify it to suit your web site.
Ø  If you have a heavily trafficked message board, you may want to consider adjusting the cache for the message board.
Caching the Message Board

Ø  MaxElementsInMemory: Monitor the cache using a JMX Console, as you cannot guess at the right amount here. You can adjust the setting if you find the cache is full.
Ø  TimeToIdleSeconds: This sets the time to idle for an element before it expires from the cache.
Ø  Eternal: If eternal, timeouts are ignored and the element is never expired.
Other Cache Settings
Ø  There are many, many other settings which can be used to tune the cache.
Ø  You can, as an example, change the cache algorithm if it seems to be caching the wrong things.
Ø  If we were to go over them all, we'd never get through to the rest of the top
5. Lucene Index Writer Interval
Ø  Whenever Liferay calls Lucene to index some content, it may create any number of files to do so.
Ø  Depending on the content, these files can be large files or lots of small files.
Ø  Every now and then, Liferay optimizes the index for reading by combining smaller files into larger files.
Ø  You can change this behavior based on your use case.
Ø  The property is lucene.optimize.interval
Ø  If you are doing a lot of publishing and loading of data, make the number very high, like 1000.
Ø  If you are doing mostly reads, make it low, like the default value of 100.
Ø  Of course, the best thing is to move search out to a separate environment, such as Solr.
6. Replace Lucene altogether with Solr
Apache says:
Ø  Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching,replication, a web administration interface and many more features. It runs in a Java servlet container such as Tomcat.
Ø  Solr allows you to abstract out of your Liferay installation everything that has to do with search, and run search from a completely separate environment.
Installing Solr in 7 Steps
Step 1: Install the Solr web application on a separate environment from your Liferay environment.
Step 2: Grab the Solr plugin from Liferay and extract it to your file system.
Step 3: Edit the file docroot/WEB-INF/src/META-INF/solr-spring.xml.
Step 4: Change the URL in the following Spring bean configuration to point to your newly installed Solr box and the save the file:

Step 5: Copy the conf/schema.xml file to the $SOLR_HOME/conf folder on your  newly installed Solr box.
Step 6: Zip the plugin back up into a .war file. Start your Solr server. Deploy the plugin to Liferay.
Step 7: Reindex your content.
7. Optimize Counter Increment
Ø  One of the ways Liferay is able to support so many databases is that it does not use any single database's method of determining sequences for primary keys.
Ø  Instead, Liferay includes its own counter utility which can be optimized.
Ø  The default value: counter.increment=100 will cause Liferay to go to the database to update the counter only once for every 100 primary keys it needs to create.
Ø  Each time the counter increments itself, it keeps track of the current set of available keys in an in-memory, cluster aware object.
Ø  You could set this to a higher number to reduce the number of database calls for primary keys within Liferay.
8. Use a Content Delivery Network
Ø  A Content Delivery Network serves up static content from a location that is geographically close to the end user.
Ø  This goes one step better than simply using a web server to serve up your static content, and is very simple to set up.
Ø[your CDN host here]
Ø  The value should include the full protocol and port number if the CDN does not use the standard HTTP and HTTPS ports.
Ø is configured this way.
9. Use a web server to serve your static resources.
Ø  Sometimes a CDN is overkill. You can gain similar benefits by configuring your  system so that all static content (images, CSS, JavaScript, etc.) is served by your web server instead of by your application server.
Ø  It is well known that a highly optimized web server can serve static resources a lot faster than an application server can.
Ø  You can use your proxy configuration and two Liferay properties to let your faster web server send these to the browser, leaving Liferay and your application server to only have to worry about the dynamic pieces of the request.
Liferay Configuration
Ø  Step 1: Set the following property in your file:
Ø  theme.virtual.path=/var/www/themes
Ø  Step 2: Set the following property in your theme:
Ø  /themes/beautiful-day-theme
Ø  Step 3: Set your server proxy to exclude from the proxy the path to the theme.
Ø  This will vary from web server to web server. For Apache and mod_proxy, you
Ø  would add this to your configuration file:
Ø  ProxyPass /themes !
Ø  With this configuration, Liferay will deploy your theme to the path specified,
Ø  which will be served up by your web server.
10. CSS/JS Sprites
You heard it here: programmers are not lazy.
When anybody is under a tight deadline, it's faster to get the project done if you implement it using experience already under your belt.
If, however, you take the time to learn to use some of Liferay's built-in tag libraries, the performance benefits will pay off.
Instead of standard tags, use the tag as shown above.
What does this do?
What's faster, transferring 100KB over 1 HTTP connection or opening up 10 connections for 10KB each? This is the reason developers have moved to CSS sprites for graphics.
If you use the Liferay tag libraries, we will do all the packing and imaging for you.
Upon deployment, Liferay, using the StripFilter and MinifierFilter, will automatically create a .sprite.png and .sprite.gif (for any IE 6 users out there), and generate code in the pages that looks like this:
style="background-image: url('/html/themes/classic/images/portlet/.sprite.png');
background-position: 50% -131px;
background-repeat: no-repeat;
height: 16px;
width: 16px;" />
Less work, same performance benefit
We don't force you to cut up images.
If you have 50 icons on one page, we consolidate that into one file automatically.
The filters understand CSS too.
11. Stupid Database tricks
Trick 1: Read-writer Database
This allows you to direct write operations and read operations to separate data sources.
You must configure your database for replication in order to do this. All major databases support this.
Make sure the spring config is included in your file
Read Writer Database
Ø  You will now have a dedicated data source where write requests will go.
Ø  With replication enabled, updates to all nodes can be done much faster by your database software.
Ø  You can have one configuration of your database optimized for reads.
Ø  You can have one configuration of your database optimized for writes.

Trick 2: Database Sharding
Sharding is splitting up your database by various types of data that may be in it.
It is a technique used for high scalability scenarios.
One algorithm might be to split up your users:
– A-D: Database 1
– E-H: Database 2
– (etc)
When users log in, they are directed to the instance of the app that has their data in it.

Liferay Sharding
Ø  Liferay supports sharding through portal instances.
Ø  You can create separate portal instances with your application in them, enable sharding, and Liferay will use its round robin shard selector to determine where users should go.
Ø  To enable sharding, use your file:
Ø  shard.selector=com.liferay.portal.dao.shard.RoundRobinShardSelector
12. HTML Positioning of Elements
Here's a code snippet from Anybody notice anything strange?

Any Comments / Suggestions welcome...!