Tuesday, December 8, 2015

Phantom Conflicts While WebSphere Application Server's NodeAgent Is Starting: a Conflicting IP Address and Port

Some time ago I got a strange error 'A conflicting IP address and port' while do try to run my NodeAgent. The situation looked like a very strange one because I checked all ports used by my servers as well as NodeAgents and DMGR and there wasn't a conflict.

The exception was following:

Trace: 2015/07/01 10:01:46.318 02 t=8E5E88 c=UNK key=S2 tag= (13007004)
SourceId: com.ibm.ws.hamanager.coordinator.dcs.CoreStackMembershipManager
ExtendedMessage: BBOO0220E: HMGR0031E: A conflicting IP address and port has been detected for the DCS_UNICAST_ADDRESS end point.
czcell\cznoded\nodeagent, czcell\cznodeb\nodeagent members are configured to use the IP address and port combination of IP::PORT.

Then the administrator can see the records:

Trace: 2015/07/01 13:26:56.778 02 t=8E5E88 c=UNK key=S2 tag= (13007004)
SourceId: com.ibm.ws.hamanager.coordinator.impl.CoordinatorImpl
ExtendedMessage: BBOO0222I: HMGR0228I: The Coordinator is not an Active Coordinator for core group DefaultCoreGroup. The active coordinator set is

... trace skipped

Trace: 2015/07/01 13:26:56.973 02 t=8E5E88 c=UNK key=S2 tag= (13007004)
SourceId: com.ibm.ws.runtime.WsServerImpl
ExtendedMessage: BBOO0220E: WSVR0009E: Error occurred during startup
com.ibm.ws.exception.RuntimeError: Unable to start the CoordinatorComponentImpl

... trace skipped

Trace: 2015/07/01 10:01:46.524 02 t=8E5E88 c=UNK key=S2 tag= (13007004)
SourceId: com.ibm.wsspi.runtime.component.WsComponentImpl
ExtendedMessage: BBOO0222I: WSVR0401W: Unable to deregister the MBean

WebSphere Application Server is so kindly and let the administrator to get the following FFDC:

[7/1/15 10:01:46:348 GMT] FFDC Exception:com.ibm.wsspi.hamanager.HAException SourceId:com.ibm.ws.hamanager.coordinator.impl.DCSPluginImpl ProbeId:255
com.ibm.wsspi.hamanager.HAException: This process has conflicting ip:port configuration with another member. See previous messages for more info.
at com.ibm.ws.hamanager.coordinator.dcs.CoreStackMembershipManager.(CoreStackMembershipManager.java:147)
at com.ibm.ws.hamanager.coordinator.impl.DCSPluginImpl.(DCSPluginImpl.java:231)
at com.ibm.ws.hamanager.coordinator.impl.CoordinatorImpl.(CoordinatorImpl.java:349)


[7/1/15 10:01:46:430 GMT] FFDC Exception:com.ibm.wsspi.hamanager.datastack.DataStackException SourceId:
com.ibm.ws.hamanager.coordinator.impl.CoordinatorImpl ProbeId:288 Reporter:com.ibm.ws.hamanager.coordinator.impl.CoordinatorImpl@f036c22f
com.ibm.wsspi.hamanager.datastack.DataStackException: Failure creating core stack
at com.ibm.ws.hamanager.coordinator.impl.DCSPluginImpl.(DCSPluginImpl.java:263)
at com.ibm.ws.hamanager.coordinator.impl.CoordinatorImpl.(CoordinatorImpl.java:349)


[7/1/15 10:01:46:445 GMT] FFDC Exception:com.ibm.wsspi.hamanager.HAInternalStateException SourceId:
com.ibm.ws.hamanager.runtime.CoordinatorComponentImpl ProbeId:234 Reporter:com.ibm.ws.hamanager.runtime.CoordinatorComponentImpl@331d9876
com.ibm.wsspi.hamanager.HAInternalStateException: failure creating the Coordinator
at com.ibm.ws.hamanager.coordinator.impl.CoordinatorImpl.(CoordinatorImpl.java:356)
at com.ibm.ws.hamanager.coordinator.corestack.CoreStackFactoryImpl.createDefaultCoreStack(CoreStackFactoryImpl.java:88)

So, the NodeAgent can't run because here is another NodeAgent configured with the same ports, but for the caught situation it was totally wrong. Ok, before there were two NodeAgents configured with a common port number but the blunder is fixed now. Why the component falls?

The reason is here - in the PROFILEDIR_A/config/cells/CELL/nodes/NODE_B/serverindex.xml file. Because the NodeAdent falls the configuration can't be synchronized and the file content can't be re-written using the right port numbers. The NodeAgent just thinks there is a misconfiguration, so one ip:port record is the same as one other ip:port record. It looks like a phantom pain. Please, have a look on the fact, PROFILEDIR_A is the folder containing the started node while NODE_B is the folder containing the configuration of a conflicting node. The last folder should contain two files: node-metadata.properties and serverindex.xml. You have to change the serverindex.xml file by restoring it from a backup copy or just copy the file from the DMGR: .../DeploymentManager/profiles/default/config/cells/CELL/nodes/NODE_B/serverindex.xml.

P.S. Let me ask you, are my notes about WebSphere Application Server interesting for you? Which other IBM's WebSphere brand components are interesting too: Process Server, Business Process Manager, MQ, Message Broker/Integration Bus, Portal, anything else? Please, share you own opinion in the comments. Thank you very much!

Would you like to give a 'Like'? Please follow me on Twitter!