Fig: Showing the search bar in stuck status.
To indicate the progressiveness of the search, a ‘search bar’ displayed on the browser. Because of an unknown bug, some where in the middle of the transaction this progress bar got stuck. No results were rendered. It didn’t happen frequently but rarely it was happening. As search is a foundational element, if it didn’t work, bookings will not happen. Besides the drop in the revenue, it would also leave a bad taste with the user, who might not return back.
Developer console tools on browser didn’t give helpful error messages/hints other than blanket statement: “SCRIPT7002: XMLHttpRequest: Network Error 0x2ef3, Could not complete the operation due to error 00002ef3.”
This problem started to happen all of a sudden, out of the blue. Team started to look in to all possibilities that could trigger this problem.
1. Any server side code changes was introduced recently?
2. Any front-end (java script) code changes was introduced recently?
3. Any Load balancer settings was changed recently?
4. Any Application server setting was changed recently?
5. Any Network settings/firewall settings was changed recently?
To diagnose the problem I installed “Wireshark” and captured TCP/IP packets on the client side. When analyzing the conversation between the client machine and server, I noticed that server was prematurely issuing FIN packets in the middle of the conversation. Please see the highlighted portion of the TCP/IP conversation below:
Fig: Whireshark TCP/IP packet capture
Thus this finding narrowed down that problem is triggered from the server side, whcih could be
a. Network/Firewall settings
b. Load Balancer settings
c. Application server settings.
When analyzing further, we noticed that accidently “KeepAliveTimeout” property in the Apache Load Balancer was changed to 1 second. Thus whenever application server took more than 1 second to respond, Apache load balancer’s “KeepAliveTimeout” kicked-in and started to issue ‘FIN’ packet. Thus connection was teared down.