Monday, October 21, 2024

How to Switchover Between Active and Standby VRM Nodes in FusionSphere

FusionSphere employs a high availability (HA) architecture for its Virtual Resource Manager (VRM) nodes. This ensures uninterrupted operation even if one node experiences an issue. This guide details the procedure for manually performing a switchover between the active and standby VRM nodes.

Prerequisites

Before attempting the switchover, ensure the following conditions are met:

  • No Heartbeat Communication Issues: The FusionCompute system should not be reporting the ALM-15.1002000 Heartbeat Communication Between Active and Standby VRM Nodes Interrupted alarm.

  • Database Synchronization Success: The ALM-15.1007027 Database Data Synchronization Between Active and Standby VRM Nodes Failed alarm should not be present in the FusionCompute system.

Procedure

  1. Log in to a VRM Node:

    • Utilize PuTTY or a similar SSH client to log in to any VRM node within the FusionSphere environment.

    • Ensure you use the correct management IP address and the username "gandalf" for authentication.

    • If your setup uses private-public key pair authentication, refer to the FusionSphere documentation for guidance on using PuTTY in key pair mode.

  2. Switch to Root User:

    • Execute the following command and provide the root user's password when prompted:

            su - root
          

  3. Check VRM Node Status:

    • Run the following command to display the current status of the active and standby VRM nodes:

            service had query
          

    • The output will resemble the following format:

            NODE             ROLE           PHASE           RESS            VER             START           
      ha1(VRM01)       active         Actived         normal          V100R001C01     2016-01-21 18:19:02 
      ha2(VRM02)       standby        Deactived       normal          V100R001C01     2016-01-28 10:36:41
          

  4. Verify RESS Values:

    • Carefully examine the RESS values in the command output.

    • If all values are "normal", proceed to the next step.

    • If any RESS value is not "normal", VRM switchover is not allowed. Contact technical support.

  5. Log in to the Active Node:

    • Using PuTTY, log in to the VRM node whose PHASE value is "Activated" (as shown in step 3).

    • Use the correct management IP address and the username "gandalf."

    • If your setup uses key pair authentication, refer to the FusionSphere documentation.

    • Once logged in, switch to the root user using the command from step 2.

  6. Initiate Switchover:

    • Execute the following command to trigger the active/standby switchover:

            sh /opt/galax/gms/common/ha/switchOverHA.sh
          

  7. Verify Switchover Success:

    • Check Node Roles: Run the service had query command again to verify the roles have changed:

      • The original standby node should now be active.

      • The original active node should now be standby.

    • Verify RESS Values: Ensure all RESS values are "normal" after the switchover.

    • Test Floating IP: Attempt to log in to FusionCompute using the floating IP address. This should now connect to the newly active VRM node.

Troubleshooting

  • If the switchover fails: Contact technical support for assistance.

  • If you encounter issues with accessing the nodes: Double-check your credentials, network connectivity, and firewall configurations.

Important Note: Performing manual switchovers should only be done in exceptional circumstances, such as during planned maintenance or when troubleshooting issues. If you're unsure about the procedure or have concerns, consult the FusionSphere documentation or reach out to technical support.

Source: Huawei Forum

0 comments:

Post a Comment