Commonspot Lock Timeouts

Posted At : August 3, 2004 11:22 AM | Posted By : Cameron
Related Categories: CommonSpot,ColdFusion

Fortunately and/or unfortunately, Commonspot is backwards compatible to older versions of ColdFusion, including versions which required very careful use of CFLOCK to ensure shared memory doesn't become corrupted. This means that even if you are running CFMX, you're going to have alot of locks being set and cleared on your server. Occasionally, Commonspot users run into situations where these locks get deadlocked and start timing out, bogging the server down.

Usually this is due to a unique situation involving load patterns or user behavior that Paperthin QA didn't catch, but telling you that that doesn't make you feel any better does it?

Today I got an email from someone having this very problem and I thought I would share my advice to him in case it comes in handy for someone else. Here's his question and my answer:

We are noticing an extreme number of lock errors in the CommonSpot code. "Error","jrpp-2","07/27/04","11:29:11","UDS_commonspot-users","A timeout occurred while attempting to lock appcommonspot-users. The specific sequence of files included or processed is: path/to/some/template.cfm This is happening very frequently and eventually causes our jrun process to either hang or go away. Random occurrences of each.

About your problem specifically, this is what I'd do:

  1. Run CFStat on the server and look to see how many threads are running. You'll start to notice a pattern where locked threads pile up and the thread count increases. Once the lock breaks the thread count will drop back down to normal.
  2. Once you think you can identify the pattern in cfstat, watch for a lock and run a series of stack traces during several locking incidents. MM has a KB article about how to do this if you've never done it before.
  3. Once you have a few stack traces, look them up and down for commonalities. Eventually you'll find some common, um, spot where everything's locking up. You'll be able to tell form the trace what line of what file it is getting stuck on. This is great ammo for Paperthin support and may even give you a hint as to a setting you can tweak to make the problem go away.

Comments
Damon Gentry's Gravatar Thanks Cameron for posting this. Unfortunately, we are using the J2EE deployment of CFMX (on JRun), so CFSTAT information is not available to us. However, we do have metrics logging enabled so we are able to obtain logs of server load.

I really wish that the same CFSTAT information was available in the JRun metrics. The new information is slightly different and some information is useful, but I'd really like to see how the DB connections are performing. Short of tracing the DB connection performance with in-line code, we really can't tell.

I'll continue to do more stack trackes. Hopefully I'll find something.
# Posted By Damon Gentry | 8/3/04 7:59 PM
Damon Gentry's Gravatar Starting to make a little progress here.

I have found some 'commonalities' in the 'cfusioni/logs/exception.log' and the stack trace information which was piped to 'jrun4/logs/cfusion-out.log'.

exception log entry (key item: jrpp-8)
Error   jrpp-8   7/21/2004   13:44:15   UDS_commonspot-users   A timeout occurred while attempting to lock appcommonspot-users. The specific sequence of files included or processed is: /w2/mentor/www/_testing/ronf/pagetemplate.cfm coldfusion.tagext.lang.LockTag$TimeoutException: A timeout occurred while attempting to lock appcommonspot-users.   at coldfusion.tagext.lang.LockTag.doStartTag(LockTag.java:179)

cfusion-out.log (stack trace: search for jrpp-8)
"jrpp-8" prio=5 tid=0x007120d8 nid=0xbc in Object.wait() [a527f000..a52819c0]
   at java.lang.Object.wait(Native Method)
   - waiting on <0xbf1d5978> (a coldfusion.runtime.RWLock)
   at coldfusion.runtime.RWLock.waitForLock(RWLock.java:132)
   at coldfusion.runtime.RWLock.requestWriteLock(RWLock.java:106)
   - locked <0xbf1d5978> (a coldfusion.runtime.RWLock)
   at coldfusion.runtime.RWLock.requestLock(RWLock.java:28)
   at coldfusion.runtime.LockManager.requestNamedLock(LockManager.java:65)
   at coldfusion.tagext.lang.LockTag.doStartTag(LockTag.java:146)
   at cfcore2dapplication2ecfm1591287353._factor19(/w2/mentor/commonspot/core-application.cfm:16)
   at cfcore2dapplication2ecfm1591287353.runPage(/w2/mentor/commonspot/core-application.cfm:1)
   at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:147)
   at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:357)
   at coldfusion.runtime.CfJspPage._emptyTag(CfJspPage.java:1871)
   at cfsite2dapplication2ecfm765082001._factor25(/w2/mentor/commonspot/site-application.cfm:15)
   at cfsite2dapplication2ecfm765082001.runPage(/w2/mentor/commonspot/site-application.cfm:1)
...

So now that I've found quite a number of these matching entries, I can see that the timeouts occur when trying to lock the Application scope. This code executes from the 'commonspot/core-application.cfm' at line 16.

Still, it doesn't appear that there is much I can do here except to submit a feature request to Paperthin to reduce the frequency of LOCKS. I'm not holding out too much hope here as the use of CFLOCK is still considered a recommended practice although it is not as crucial in CFMX.
# Posted By Damon Gentry | 8/4/04 9:15 AM

Recent Entries

Archives By Subject

Tech Blogs

(Mostly) Not Tech Blogs