Help me stabilize this jRun configuration (CF9 / Win2k3 / IIS6)

Not sure if this is a better fit for ServerFault, but since I am not an admin but a developer, I figured I would try SO.

We tried to keep the configuration of several servers stable for a while. At the end of last month we were running CF 7.0.2 on two servers (one instance each). At this point, we were able to speed things up to 1 week per instance before they restart on their own. Since the beginning of the month we have upgraded to CF 9 and we are back to square with multiple restarts per day.

Our current configuration is 2 Win2k3 servers running a cluster of 4 instances, 2 instances per server. At this point, we are pretty sure that this is due to incorrect JVM settings.

We played with them and while some are more stable than others, we never got it right.

From default:

java.args=-server -Xmx512m -Dsun.io.useCanonCaches=false -XX:MaxPermSize=192m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/

      

Currently:

java.args=-server -Xmx896m -Dsun.io.useCanonCaches=false -XX:MaxPermSize=512m -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/ -verbose:gc -Xloggc:c:/Jrun4/logs/gc/gcInstance1b.log

      

We determined that we needed more than the standard 512MB by simply monitoring the FusionReactor, our memory consumption hovers in the middle of 300MB on average and can run as low as 700MB under heavy load.

Most of the crash will be logged to jrun4 / bin / hs_err_pid * .log is always "out of swap"

I have tied the hs_err and garbage collector log file links from yesterday at the bottom of the post.

Relevant part ( I think ):

Heap
 PSYoungGen      total 89856K, used 19025K [0x55490000, 0x5b6f0000, 0x5b810000)
  eden space 79232K, 16% used [0x55490000,0x561a64c0,0x5a1f0000)
  from space 10624K, 52% used [0x5ac90000,0x5b20e2f8,0x5b6f0000)
  to   space 10752K, 0% used [0x5a1f0000,0x5a1f0000,0x5ac70000)
 PSOldGen        total 460416K, used 308422K [0x23810000, 0x3f9b0000, 0x55490000)
  object space 460416K, 66% used [0x23810000,0x36541bb8,0x3f9b0000)
 PSPermGen       total 107520K, used 106079K [0x03810000, 0x0a110000, 0x23810000)
  object space 107520K, 98% used [0x03810000,0x09fa7e40,0x0a110000)

      

From this I understand that his PSPermGen is full (most logs will show the same before crashing), so we increased MaxPermSize, but the total is showing as 107520K! ??!

No one is a jRun expert, so any help or even ideas on what to do next would be greatly appreciated!

Log files: Sorry I know sendpace is not the friendliest of the places - if you have any other suggestion for log files let me know and I'll update the post (SO doesn't like their inline, it blows up the post format).

+2


a source to share


2 answers


Small update. I tried different GCs, and although some stabilized the system for a while, it kept crashing, but less frequently. So I kept digging and eventually found out that the JVM would throw "Out of swap space" when the OS itself refuses to allocate the requested memory.

This usually happens when the maximum memory is already assigned to the JVM process, this is the jrun overhead, the JVM itself, all libraries, heap and stack. Since the request lives on the stack, if you have a lot of requests, the stack will grow and grow. The size of each thread is OS and JVM version dependent, but can be controlled with the -Xss argument. I scaled down ours to 64k, so our java.args looks like this:

java.args=-server -Xmx768m -Xss64k -Dsun.io.useCanonCaches=false -XX:MaxPermSize=512m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/ -verbose:gc -Xloggc:c:/Jrun4/logs/gc/gcInstance2a.log

      



So far, everything has been stable with no noticeable slowdown for 6 days, which is by far the longest I have ever seen the app staying up. If you reduce the request size too much, you will start to notice errors in the log instead of OOM errors.

The next step is to tweak the MaxPermSize, but that's good for now!

+1


a source


This is an effect that can have many reasons - anything from how your application is built (unconventional use of application or server scope? Bad database drivers and connection management? Parsing giant XML files? Using CFHTTP or other external resources? Problems with your own session replication?) for your coding methods (var scoping all over the place?) for the types of processors on your servers. You probably won't come up with some magic bullet JVM tweaks without a lot of analysis (and maybe not even then). But first, why do you have such an unusually large PermGen? Sounds like a kind of sample, but of course I don't know anything about your application.

You seem to have little chance of losing after trying several different garbage collectors. If this matches your JVM version, try:

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC 

      



and add:

-XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled

      

which can help manage your great PermGen. Don't forget to take XX: + UseParallelGC if you try them.

+2


a source







All Articles