Troubleshoot integration issues when Maximo stops publishing data to external system.

I had to deal with this quite often. Most of the times, I got it right and able to identify the problem quickly. In a few cases, it took some time, and usually very stressful as it mostly occurs in Production. (It occurs in DEV and PRE-PROD all the time, it’s just that people usually don’t care, and it goes unnoticed)

Today I had to deal with it again and it took me some time. The cause was something I dealt with before, was told by a colleague on how to fix it (the easy way), but I forgot. This time around, under panic mode, I restarted a few JVMs before I remembered I should ask around and was reminded by my colleague again that it could be fixed with much less damage. I told myself I should write it down for the next time, so here is the sum of what I learned:


Setting up alarms for integration

 When writing a piece of software, we are in total control of the quality of the product. With integration, many elements are not under our control. Network and firewall are usually managed by IT. With external systems, we usually don’t know how they work, or many times, not given access. Yet, any changes to these elements can cause our interfaces to fail.

For synchronous interfaces, the user would receive instant feedback after each action is taken (e.g. Maximo - GIS integration), thus, we don’t usually need to setup alarms. For asynchronous interfaces, which usually run in the background, and don’t give instant feedback, when failure occurs, it usually goes unnoticed. In many cases, we only find out about failures after it has caused some major damage.

A good interface must provide adequate mechanism to handle failures, and in the case of async integration, proper alarms and reports should be setup so that failures are captured and handled proactively by IT and application administrators.