Usually this is due to a unique situation involving load patterns or user behavior that Paperthin QA didn't catch, but telling you that that doesn't make you feel any better does it?
Today I got an email from someone having this very problem and I thought I would share my advice to him in case it comes in handy for someone else. Here's his question and my answer:
We are noticing an extreme number of lock errors in the CommonSpot code. "Error","jrpp-2","07/27/04","11:29:11","UDS_commonspot-users","A timeout occurred while attempting to lock appcommonspot-users. The specific sequence of files included or processed is: path/to/some/template.cfm This is happening very frequently and eventually causes our jrun process to either hang or go away. Random occurrences of each.
About your problem specifically, this is what I'd do:
- Run CFStat on the server and look to see how many threads are running. You'll start to notice a pattern where locked threads pile up and the thread count increases. Once the lock breaks the thread count will drop back down to normal.
- Once you think you can identify the pattern in cfstat, watch for a lock and run a series of stack traces during several locking incidents. MM has a KB article about how to do this if you've never done it before.
- Once you have a few stack traces, look them up and down for commonalities. Eventually you'll find some common, um, spot where everything's locking up. You'll be able to tell form the trace what line of what file it is getting stuck on. This is great ammo for Paperthin support and may even give you a hint as to a setting you can tweak to make the problem go away.