I had to deal with this quite often. Most of the times, I got it right and able to identify the problem quickly. In a few cases, it took some time, and usually very stressful as it mostly occurs in Production. (It occurs in DEV and PRE-PROD all the time, it’s just that people usually don’t care, and it goes unnoticed)
Today I had to deal with it again and it took me some time. The cause was something I dealt with before, was told by a colleague on how to fix it (the easy way), but I forgot. This time around, under panic mode, I restarted a few JVMs before I remembered I should ask around and was reminded by my colleague again that it could be fixed with much less damage. I told myself I should write it down for the next time, so here is the sum of what I learned: