Build Load Balancing Service in VMC on AWS with Avi Load Balancer – Part1

When we design a highly available (HA) infrastructure for a mission-critical application, local load balancing and global load balancing are always the essential components of the solution. This series of blogs will demonstrate how to build an enterprise-level local load balancing and global load balancing service in VMC on AWS SDDC with Avi Networks load balancer.

This series of blogs will cover the following topics:

  1. How to deploy Avi load balancer in a VMC SDDC;
  2. How to set up local load balancing service to achieve HA within a VMC SDDC (
  3. How to set up global load balancing service to achieve HA across different SDDCs which are in different AWS Availability Zones (
  4. How to set up global load balancing site affinity (
  5. How to automate Avi LB with Ansible (

By the end of this series, we will complete an HA infrastructure build as the following diagram: this design leverages local load balancing service and global load balancing service to provide 99.99%+ SLA to a web-based mission-critical application.

The Avi load balancer platform is built on software-defined architectural principles which separate the data plane and control plane. The product components include:

  • Avi Controller (control plane) The Avi Controller stores and manages all policies related to services and management. HA of the Avi Controller requires 3 separate Controller instances, configured as a 3-node cluster
  • Avi Service Engines (data plane) Each Avi Service Engine runs on its own virtual machine. The Avi SEs provide the application delivery services to end-user traffic, and also collect real-time end-to-end metrics for traffic between end-users and applications.

In Part 1, we will cover the deployment of Avi load balancer. The diagram below shows the controller and service engine (SE) network connectivity and IP address allocation.

Depending on the level of vCenter access provided, Avi load balancer supports 3 modes of deployment. In VMC on AWS, only the “no-access” mode is supported. Please refer to for more information about Avi load balancer deployment modes in VMWare Cloud.

Section 1: Controller Cluster

Let’s start to deploy the Avi controllers and set up the controller cluster. First, download the ova package for the controller appliance. In this demo, the version of Avi load balancer controller is v18.2.5. After the download, deploy the controller virtual appliance via “Deploying OVF Template” wizard in VMC SDDC vCenter. In the “Customize template” window, input parameters as below:

  • Management interface IP:
  • Management interface Subnet mask:
  • Default gateway:
  • Sysadmin login authentication key: Password

After this 1st controller appliance is deployed and powered on, it is ready to start the controller initial configuration. Go to the controller management GUI

(1) Username/Password

(2) DNS and NTP

(3) SMTP

(4) Multiple-Tenants? Select No here for simplification.

The initial configuration for the 1st controller is completed. As the first controller of the cluster, it will receive the “Leader” role. The second and third controller will work as “Follower”. When we are logged in the GUI of this first controller, go to Administration—>Controller, as shown below.

Similarly, go to deploy and perform the initial configuration for the 2nd ( and 3rd controller (

In the management GUI of the 1st controller, go to Administration—>Controller and click “Edit”. In “Edit Controller Configuration” window, add the second node and third node into the cluster as below.

After a few minutes, the cluster is set up successfully.

Section 2: Service Engine

Now it is ready to deploy SE virtual appliances. In this demo, two SEs will be deployed. These 2 SEs are added into the default Sevice Engine Group with the default HA mode (N+M).

Step 1: Create and download the SE image.

Go to Infrastructure—>Clouds, click the download icon and select the ova format. Please note that this SE ova package is only for the linked controller cluster. It can not be used for another controller cluster.

Step 2: Get the cluster UUID and authentication token for SE deployment.

Step 3: In SDDC vCenter, run the “Deploy OVF Template” wizard to import SE ova package. In the “Customize template” window, the input parameters:

  • IP Address of the Avi Controller: (cluster IP of the controller)
  • Authentication token for Avi Controller: as Step2
  • Controller Cluster UUID for Avi Controller: as Step 2
  • Management Interface IP Address:
  • Management Interface Subnet Mask:
  • Default Gateway:
  • DNS Information:
  • Sysadmin login authentication key: Password

Please note that the second vNIC will be used as the SE data interface.

Then continue to deploy the second SE (mgmt IP:

The deployed SEs will register themself into the controller cluster as below.

Step 4: Now the SEs have established the control and management plane communication with the controller cluster. It is time to set up the SE’s data network.

During the setup, I found that the vNIC for virtual appliance VM and SE Ethernet Interface is not properly mapped, for example, the data interface is the 2nd vNIC of SE VM in vCenter but it is shown as Ethernet 5 in SE network setup. To get the correct mapping, the mac address of data vNIC will be leveraged. Go to SDDC vCenter and get the MAC address of SE data interface.

In the controller management GUI, go to Infrastructure—>Service Engine and edit the selected SE. In the interface list, select the correct interface which has the same mac address then provide the IP address and subnet mask.

The final step is to add a gateway for this data interface. Go to Infrastructure—>Routing—>Static Route and create a new static default route.

Tip: VM-VM anti-affinity policy is highly recommended to enhance the HA of the controller and service engine virtual appliances.

This is the end of the blog. Thank you very much for reading!

Setting Up Federated Identity Management for VMC on AWS – Authentication with Okta IdP

The Federated Identity feature of VMware Cloud on AWS can be integrated with all 3rd party IdPs who support SAML version 2.0. In this integration model, the customer dedicated vIDM tenant will work as SAML Service Provider. If the 3rd party IdP is set up to perform multi-factor authentication (MFA), the customer will be prompted MFA for access to VMware Cloud services. In this blog, the integration with one of the most popular IdP Okta will be demoed.


The Okta IdP settings in this blog are to demo the integration for vIDM, which may not be the best practise for your environment or meet your business and security requirements.

Note: please complete the vIDM connector installation and the vIDM tenant basic setup as per my first blog of this series ( before continuing.

To add the same users and user groups in Okta IdP as the configured vIDM tenant, we need to integrate Okta with corporate Active Directory (AD). The integration is via Okta’s lightweight agent.

Click the “Directory Integration” in Okta UI.

Click “Add Active Directory”.

The Active Directory integration setup wizard will start and click “Set Up Active Directory”.

Download the agent as required in the below window.

This agent can be installed on a Windows Server 2008 R2 or later. The installation of this Okta agent is quite straightforward. Once the agent installation is completed, you need to perform the setup of this AD integration. In the basic setting window, select the Organizational Units (OUs) that you’d like to sync users or groups from and make sure that “Okta username format” is set up to use User Principle Name (UPN).

In the “Build User Profile” window, select any custom schema which needs to be included in the Okta user profile and click Next.

Click Done to finish the integration setup.

The Okta directory setting window will pop up.

Enable the Just-In-Time provisioning and set the Schedule Import to perform user import every hour. Review and save the setting.

Now go to the Import tab and click “Import Now” to import the users from corporate AD.

As it is the first time to import user/users from customer AD, select “Full Import” and click Import.

When the scan is finished, Okta will report the result. Click OK.

Select the user/users to be imported and confirm the user assignment. Note: the user jsmith@lab.local is imported here, who will be used for the final integration testing.

Now it is time to set up the SAML IdP in Okta.

Go to Okta Classic UI application tab and click “Add Application”

Click “Create New App”;

Select Web as the Platform and “SAML 2.0” for Sign on method and click Create;

Type in App name, “csp-vidm” is used as an example as the app name and click Next;

There are two configuration items in the popped up “Create SAML Integration” window which is mandatory. These information can be copied from Identity Provider setting within vIDM tenant.

Go to vIDM tenant administrator console and click “Add Identity Provider” and select “Create Third Party IDP” within the “Identity & Access Management” tab.

Type in the “Identity Provider Name”, here the example name is “Okta01”

Go to the bottom of this IdP creation window and click “Service Provider (SP) Metadata”.

A new window will pop up as the below:

The entity ID and HTTP-POST location are required information for Okta IdP SAML setting. Copy the entity ID URL link into the “Audience URI (SP Entity ID) and HTTP-POST location into “Single sign on URL” in the Okta “Create SAML Integration” window.

Leave all other configuration items as the default and click Next;

In the Feedback window, suggest the newly created app is an internal app and click Finish.

A “Sign On settings” window will pop up as below, click “Identity Provider metadata” link.

The XML file format of Identity Provider metadata shows up. Select all content of this XML file and copy.

Paste the Okta IdP metadata into SAML Metadata and click “Process IdP Metadata” in the vIDM 3rd party identity provider creation window.

The “SAML AuthN Request Binding” and “Name ID format mapping from SAML Response” will be updated as below:

Select “lab.local” directory as users who can authenticate with this new 3rd party IdP and leave the Network as default “All RANGES”. Then create a new authentication method called “Okta Auth” with SAML Context “urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtected“. Please note that the name of this newly created authentication method has to be different from any existing authentication method.

Then leave all other configuration items’ box unchecked and click Add.

The 3rd party IdP has been successfully added now.

The last step of vIDM set up for this Okta integration is updating the default access policy to use the newly defined authentication method “Okta Auth”. Please follow up the steps in my previous blog ( to perform the required update. The updated default access policy should be similar as below.

Before going to test the setup, go to Okta UI to assign user/s to the newly defined SAML 2.0 web application “csp-vidm”. Click Assignment.

Click Assign and select “Assign to People”.

In the “Assign csp-vidm to People” window, assign user John Smith (jsmith@lab.local), which means that the user John Smith is allowed by this SAML 2.0 application.

After the assignment is completed, user John Smith is under the assignment of this SAML 2.0 application “csp-vidm”.

Instead of assigning individual users, AD group/groups can be assigned to the SAML application as well.

Finally, everything is ready to test the integration.

Open a new Incognito window in a Chrome browser and type in the vIDM tenant URL then click Enter.

In the log in window, type user name jsmith@lab.local and click Next.

The authentication session is redirected to Okta.

Type in Username & Password and click “Sign In”.

Then John Smith (jsmith@lab.local) successfully logs in the vIDM tenant.

This is the end of this demo. Thank you very much for reading!

Setting Up Federated Identity Management for VMC on AWS – Authentication with Active Directory

This blog is the second blog of this Federated Identity Management for VMC on AWS series. Please complete the vIDM connector installation and setup as per my first blog of this series before moving forward. (

VMware Cloud on AWS Federated Identity management supports different kinds of authentication methods. This blog will demo the basic method: authentication with the customer corporate Active Directory (AD).

When VMC on AWS customers use AD for authentication, outbound-only connection mode is highly recommended. This mode does not require any inbound firewall port to be opened: only outbound connectivity from vIDM Connector to VMware SaaS vIDM tenant on port 443 is required. All user and group sync from your enterprise directory and user authentication are handled by the vIDM connector.

To enable outbound-only mode, go to update the settings of the Build-in Identity Provider. In the user section of Built-in Identity Provider settings, select the newly created directory “lab.local” and add the newly created connector “vidmcon01.​lab.​local”.

After the connetor is added successfully, select Password (cloud deployment) in the “Connector Authentication Methods” and click Save.

Now it is time to update the access policy to use corporate Active Directory to authenticate VMC users.

Go to Identity & Access Management.

Click “Edit DEFAULT POLICY” then the “Edit Policy” window pop up. Click Next.


Then the “Add Policy Rule” window will pop up. At this stage, just leave the first two configuration items as default: “ALL RANGES” and “ALL Device Types”. In the “and user belong to group(s)” config item, search and add all 3 synced groups (sddc-admins, sddc-operators and sddc-readonly) to allow the users in these 3 groups to log in.

Add Password(cloud deployment) as authentication method.

Use Password(Local Directory) as fallback authentication method and click Save.

There are 3 rules defined in the default access policy. Drag the newly defined rule to the top of the rules table, which will make sure that the new rule is evaluated first when a user tries to log in.

Now the rules table shows as below. Click Next.

Click Save to keep the changes of the default access policy.

You are now good to test your authentication set up. Open a new Incognito window in your Chrome browser and connect to the vIDM URL. Type in the username (jsmith@lab.local) and click Next.

Type in the Active Directory password for user jsmith@lab.local and click “Sign in”.

Then you can see that jsmith@lab.local has successfully logged in the vIDM!

Thank you very much for reading!

Setting Up Federated Identity Management for VMC on AWS – Install and Setup vIDM Connector

As an enterprise using VMware Cloud Services, you can set up federation with your corporate domain. Federating your corporate domain allows you to use your organization’s single sign-on and identity source to sign in to VMware Cloud Services. You can also set up multi-factor authentication as part of federation access policy settings.

Federated identity management allows you to control authentication to your organization and its services by assigning organization and service roles to your enterprise groups.

Set up a federated identity with the VMware Identity Manager service and the VMware Identity Manager connector, which VMWare provide at no additional charge. The following are the required high-level steps.

  1. Download the VMware Identity Manager (vIDM) connector and configure it for user attributes and group sync from your corporate identity store. Note that only the VMware Identity Manager Connector for Windows is supported.
  2. Configure your corporate identity provider instance using the VMware Identity Manager service.
  3. Register your corporate domain.

This series of blogs will demonstrate how to complete customer end setup of the Federated Identity Management for VMC on AWS.

  1. Install and Setup vIDM connector, which is required for all 4 use cases;
  2. Use Case 1: authenticate the users with On-prem Active Directory; (
  3. Use Case 2: authenticate the users with third party IDP Okta (
  4. Use Case 3: authenticate users with Active Directory Federation Services ( )
  5. Use Case 4: authenticaate user with Azure AD (

As the 1st blog of this series, I will show you how to install the vIDM connector (version 19.03) on Windows 2012 R2 server and how we achieve the HA for vIDM connector.


  • a vIDM SaaS tenant. If you don’t have one, please contact VMware customer success representative.
  • a Window Server (Windows 2008 R2, Windows 2012, Windows 2012 R2 or Windows 2016).
  • Open the firewall rules for communication from Windows Server to domain controllers and vIDM tenant on port 443.
  • vIDM connector for Windows installation package. The latest version of vIDM connector is shown below.


Log in to the Windows 2012 R2 server and start the installation:

Click Yes in the “User Account Control” window.

Note the installation package will install the latest major JRE version on on the connector windows server if the JRE has not been installed yet.

The installation process is loading the Installation Wizard.

Click Next in the Installation Wizard window.

Accept the License Agreement as below:

Accept the default of installation destination folder and click Next;

Click Next and leave the “Are you migrating your Connector” box unchecked.

Accept the pop-up hostname and default port for this connector.

As the purpose of VMware Cloud federated identity management, please don’t run the Connector service as domain user account. So leave this “Would you like to run the Connector service as a domain user account?” option box unchecked and click Next.

Click Yes in the pop-up window to confirm from the previous step.

Click Install to begin the installation.

Wait for a few minutes, the installation has completed successfully.

Click Finish. A new window will pop up, which suggests the Connector appliance management URL as below .

Click Yes. The browser is opened and will redirect to https://vidmconn01.lab.local:8443. Accept the alert of security certificate and continue to this website.

In the VMware Identity Manager Appliance Setup wizard, click Continue.

Note: Don’t use Internet Explorer when running the wizard. There is a known bug with IE.

Set passwords for appliance application admin account and click Continue.

Now go to the vIDM tenant, in the tab of Identity & Access Management, click Add Connector.

Type in Connector ID Name and Click “Generate Activation Code”.

Copy the generated activation code and go back to the Connector setup wizard.

Copy the activation code into the Activate Connector Window and click Continue.

Wait for a few minutes then the connector will be activated.

Note: sometimes a 404 error will pop up like the below. As my experience, it is a false alert for Windows 2012 R2. Don’t worry about it.

In VMware Identity Manager tenant, the newly installed connector will show up as below:


Now it is time to set up our connector for user sync.

Step 1: Add Directory

Click Add Directory and select “Add Active Directory over LDAP/IWA”.

Type in “Directory Name”, select “Active Directory over LDAP” and use this directory for user sync and authentication. In the “Directory Search Attribute”, I prefer to use UserPrincipalName than sAMAccountName as the UserPrincipalName option will work for all Federated Identity management use cases, e.g. integration with Active Directory Federation Service and 3rd Party IDP.

Then provide the required Bind User Details and click “Save & Next”

After a few minutes, the domain will pop up. Click Next.

In the Map User Attributes window, accept the setup and click Next

Type in the group DNs and click “Find Groups”.

Click the “0 of 23” under the column “Groups to sync”.

Select 3 user groups which need to be synced and click Save.

Click Next.

Accept the default setting in the “Select the Users you would like to sync” window and click Next.

In the Review window, click “Sync Directory”

Now it is time to verify that the synced users and groups in VIDM tenant. Go to the “User & Groups” tab. You can see we have 10 users and 3 groups that are synced from lab.local directory.

You can find the sync log within the configured directory.

Now the basic set up of vIDM connector has been completed.

Connector HA

A single VMware Identity manager is considered as a single point of failure in an enterprise environment. To achieve the high availability of connectors, just install an extra one or multiple connectors, the installation of an extra connector is exactly same as installing the 1st connector. Here, the second connector is installed on another Windows 2012 R2 server vidmcon02.lab.local. After the installation is completed, the activation procedure of the connector is the same as well.

Now 2 connectors will show up in the vIDM tenant.

Go to the Built-in identity provider and add the second connector.

Type in the Bind User Password and click “Add Connector”

Then the second connector is added successfully.

Now there are 2 connectors associated with the Built-in Identity Provider.

Please note connector HA is only for user authentication in version 19.03. Directory or user sync can only be enabled on one connector at a time. In the event of a connector instance failure, authentication is handled automatically by another connector instance. However, for directory sync, you must modify the directory settings in the VMware Identity Manager service to use another connector instance like the below.

Thank you very much for reading!

Integrate VMware NSX-T with Kubernetes

Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. K8s use network plugin to provide the required networking functions like routing, switching, firewall and load balancing. VMware NSX-T provides a network plugin called NCP for K8s as well. If you want to know more about VMware NSX-T, please go to

In this blog, I will show you how to integrate VMWare NSX-T with Kubernetes.

Here, we will build a three nodes single master K8s cluster. All 3 nodes are RHEL 7.5 virtual machine.

  • master node:
    • Hostname: master.k8s
    • Mgmt IP:
  • worker node1:
    • Hostname: node1.k8s
    • Mgmt IP:
  • worker node2:
    • Hostname: node2.k8s
    • Mgmt IP:

On each node, there are 2 vNICs attached. The first vNIC is ens192 which is for management and the second vNIC is ens224, which is for K8s transport and connected to an overlay logical switch.

NSX-T version:;

NSX-T NCP version:

Docker version: 18.03.1-ce;

K8s version: 1.11.4

1. Prepare K8s Cluster Setup

1.1 Get Offline Packages and Docker Images

As there is no Internet access in my environment, I have to prepare my K8s cluster offline. To do that, I need to get the following packages:

  • Docker offline installation packages
  • Kubeadm offline installation packages which will be used to set up the K8s cluster;
  • Docker offline images;

1.1.1 Docker Offline Installation Packages

Regarding how to get Docker offline installation packages, please refer to my other blog: Install Docker Offline on Centos7.

1.1.2 Kubeadm Offline Installation Packages

Getting Kubeadm offline installation packages is quite straightforward as well. You can use Yum with downloadonly option.

yum install --downloadonly --downloaddir=/root/ kubelet-1.11.0
yum install --downloadonly --downloaddir=/root/ kubeadm-1.11.0
yum install --downloadonly --downloaddir=/root/ kubectl-1.11.0

1.1.3 Docker Offline Images

Below are the required Docker images for K8s cluster.

  • v1.11.4
  • v1.11.4
  • v1.11.4
  • v1.11.4
  • 1.1.3
  • 3.2.18
  • 3.1
  • 3.1

You possibly notice that the above includes two
identical pause images although these two have different repository names. There is a story around this. Initially, I only got the first image
“” loaded. The setup passed through “kubeadm init” pre-flight but failed at the real cluster setup stage. When I checked the log, I found out that the cluster set up process kept requesting the second image. I guess it is a bug with kubeadm v1.11.0 which I am using.

I put an example here to show how to use “docker pull” CLI to download a docker image in case you don’t know how to do it.

docker pull

Once you have all Docker images, you need to export these Docker images as offline images via “docker save”.

docker save -o /pause-amd64:3.1.docker

Now it is time to upload all your installation packages and offline images to all your K8s 3 nodes including master node.

1.2 Disable SELinux and Firewalld

# disable SELinux
setenforce 0
# Change SELINUX to permissive for /etc/selinux/config
vi /etc/selinux/config
# Stop and disable firewalld
systemctl disable firewalld && systemctl stop firewalld

1.3 Config DNS Resolution

# Update the /etc/hosts file as below on all three K8s nodes   master.k8s   node1.k8s   node2.k8s

1.4 Install Docker and Kubeadm

To install Docker and Kubeadm, first you put all required packages for Docker or kubeadm into a different directory. For example, all required packages for kubeadm are put into a directory called kubeadm. Then use rpm to install kubeadm as below:

[root@master kubeadm]# rpm -ivh --replacefiles --replacepkgs *.rpm
warning: 53edc739a0e51a4c17794de26b13ee5df939bd3161b37f503fe2af8980b41a89-cri-tools-1.12.0-0.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID 3e1ba8d5: NOKEY
warning: socat- Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY
Preparing...                          ########################## [100%]
Updating / installing...
   1:socat-              ########################## [ 17%]
   2:kubernetes-cni-0.6.0-0           ########################## [ 33%]
   3:kubelet-1.11.0-0                 ########################## [ 50%]
   4:kubectl-1.11.0-0                 ########################## [ 67%]
   5:cri-tools-1.12.0-0               ########################## [ 83%]
   6:kubeadm-1.11.0-0                 #########################3 [100%]

After Docker and Kubeadm are installed, you can go to enable and start docker and kubelet service:

systemctl enable docker && systemctl start docker
systemctl enable kubelet && systemctl start kubelet

In addition, you need to perform some OS level setup so that your K8s environment can run properly.

sysctl -w net.bridge.bridge-nf-call-iptables=1
echo "net.bridge.bridge-nf-call-iptables=1" > /etc/sysctl.d/k8s.conf
# Disable Swap
swapoff -a && sed -i '/ swap / s/^/#/' /etc/fstab

1.5 Load Docker Offline Images

Now let us load all offline docker images into your local Docker repo on all K8s node via CLI “docker load”.

docker load -i kube-apiserver-amd64:v1.11.4.docker
docker load -i coredns:1.1.3.docker
docker load -i etcd-amd64:3.2.18.docker
docker load -i kube-apiserver-amd64:v1.11.4.docker
docker load -i kube-controller-manager-amd64:v1.11.4.docker
docker load -i kube-proxy-amd64:v1.11.4.docker
docker load -i kube-scheduler-amd64:v1.11.4.docker
docker load -i pause-amd64:3.1.docker
docker load -i pause:3.1.docker

1.6 NSX NCP Plugin

Now you can upload your NSX NCP plugin to all 3 nodes and load the NCP images into local Docker repo.

1.6.1 Load NSX Container Image

docker load -i nsx-ncp-rhel- 

Now the docker image list on your K8s nodes will be similar to below:

[root@master ~]# docker image list
REPOSITORY                                   TAG                 IMAGE ID            CREATED             SIZE
registry.local/   latest              97d54d5c80db        5 months ago        701MB                  v1.11.4             5071d096cfcd        5 months ago        98.2MB              v1.11.4             de6de495c1f4        5 months ago        187MB     v1.11.4             dc1d57df5ac0        5 months ago        155MB              v1.11.4             569cb58b9c03        5 months ago        56.8MB                           1.1.3               b3b94275d97c        11 months ago       45.6MB                        3.2.18              b8df3b177be2        12 months ago       219MB                       3.1                 da86e6ba6ca1        16 months ago       742kB                             3.1                 da86e6ba6ca1        16 months ago       742kB

1.6.2 Install NSX CNI

rpm -ivh --replacefiles nsx-cni-

Please note replacefiles option is required as a known bug with NSX-T 2.3. If you don’t include the replacefiles option, you will see an error like below:

[root@master rhel_x86_64]# rpm -i nsx-cni-
   file /opt/cni/bin/loopback from install of nsx-cni- conflicts with file from package kubernetes-cni-0.6.0-0.x86_64

1.6.3 Install and Config OVS

# Go to OpenvSwitch directory
rpm -ivh openvswitch-
systemctl start openvswitch.service && systemctl enable openvswitch.service
ovs-vsctl add-br br-int
ovs-vsctl add-port br-int ens224 -- set Interface ens224 ofport_request=1
ip link set br-int up
ip link set ens224 up

2. Setup K8s Cluster

Now you are ready to set up your K8s cluster. I will use kubeadm config file to define my K8s cluster when I initiate the K8s cluster setup. Below is the content of my kubeadm config file.

kind: MasterConfiguration
kubernetesVersion: v1.11.4
  bindPort: 6443

From the above, you can see that Kubernetes version v1.11.4 will be used and the API server IP is, which is the master node IP. Run the following CLI from K8s master node to create the K8s cluster.

kubeadm init --config kubeadm.yml

After the K8s cluster is set up, you can join the resting two worker nodes into the cluster via CLI below:

kubeadm join --token up1nz9.iatqv50bkrqf0rcj --discovery-token-ca-cert-hash sha256:3f9e96e70a59f1979429435caa35d12270d60a7ca9f0a8436dff455e4b8ac1da

Note: You can get the required token and discovery-token-ca-cert-hash from the output of “kubeadm init”.

3. NSX-T and K8s Integration

3.1 Prepare NSX Resource

Before the integration, you have to make sure that you have NSX-T resources configured in NSX manager. The required resource includes:

  • Overlay Transport Zone: overlay_tz
  • Tier 0 router: tier0_router
  • K8s Transport Logical Switch
  • IP Blocks for Kubernetes Pods: container_ip_blocks
  • IP Pool for SNAT: external_ip_pools
  • Firewall Marker Sections: top_firewall_section_marker and bottom_firewall_section_marker

Please refer the NSX Container Plug-in for Kubernetes and Cloud Foundry – Installation and Administration Guide to further check how to create the NSX-T resource. The following are the UUID for all created resources:

  • tier0_router = c86a625e-54e0-4510-9185-e9e1b7e26eb9
  • overlay_tz = f6d90300-c56e-4d26-8684-8eff64cdf5a0
  • container_ip_blocks = f9e411f5-654e-4f0d-99e8-2e5a9812f295
  • external_ip_pools = 84ffd635-640f-41c6-be85-71337e112e69
  • top_firewall_section_marker = ab07e559-79aa-4bc9-a6f0-126ea59278c2
  • bottom_firewall_section_marker = 35aaa6c5-0870-4ac4-bf47-114780863956

In addition, make sure that you tagged switching ports which three k8s nodes are attached to in the following ways:

{'ncp/node_name': '<node_name>'}
{'ncp/cluster': '<cluster_name>'}

node_name is the FQDN hostname of the K8s node and the cluster_name is what you call this cluster in NSX not in K8s cluster context. I show you here my K8s nodes’ tags.

k8s master switching port tags
k8s node1 swicthing port tags

k8s node2 swicthing port tags

3.2 Install NSX NCP Plugin

3.2.1 Create Name Space

kubectl create ns nsx-system

3.2.2 Create Service Account for NCP

kubectl apply -f rbac-ncp.yml -n nsx-system

3.2.3 Create NCP ReplicationController

kubectl apply -f ncp-rc.yml -n nsx-system

3.2.4 Create NCP nsx-node-agent and nsx-kube-proxy DaemonSet

kubectl create -f nsx-node-agent-ds.yml -n nsx-system 

You can find the above 3 yaml files in Github

Now you have completed the NSX-T and K8s integration. If you check the pods running on your K8s cluster, you will see the similar as below:

[root@master ~]# k get pods --all-namespaces 
NAMESPACE     NAME                                   READY     STATUS    RESTARTS   AGE
kube-system   coredns-78fcdf6894-pg4dz               1/1       Running   0          9d
kube-system   coredns-78fcdf6894-q727q               1/1       Running   128        9d
kube-system   etcd-master.k8s                        1/1       Running   3          14d
kube-system   kube-apiserver-master.k8s              1/1       Running   2          14d
kube-system   kube-controller-manager-master.k8s     1/1       Running   3          14d
kube-system   kube-proxy-5p482                       1/1       Running   2          14d
kube-system   kube-proxy-9mnwk                       1/1       Running   0          12d
kube-system   kube-proxy-wj8qw                       1/1       Running   3          14d
kube-system   kube-scheduler-master.k8s              1/1       Running   3          14d
ns-test1000   http-echo-deployment-b5bbfbb86-j4dxq   1/1       Running   0          2d
nsx-system    nsx-ncp-rr989                          1/1       Running   0          11d
nsx-system    nsx-node-agent-kbsld                   2/2       Running   0          9d
nsx-system    nsx-node-agent-pwhlp                   2/2       Running   0          9d
nsx-system    nsx-node-agent-vnd7m                   2/2       Running   0          9d
nszhang       busybox-756b4db447-2b9kx               1/1       Running   0          5d
nszhang       busybox-deployment-5c74f6dd48-n7tp2    1/1       Running   0          9d
nszhang       http-echo-deployment-b5bbfbb86-xnjz6   1/1       Running   0          2d
nszhang       jenkins-deployment-8546d898cd-zdzs2    1/1       Running   0          11d
nszhang       whoami-deployment-85b65d8757-6m7kt     1/1       Running   0          6d
nszhang       whoami-deployment-85b65d8757-b4m99     1/1       Running   0          6d
nszhang       whoami-deployment-85b65d8757-pwwt9     1/1       Running   0          6d

In NSX-T manager GUI, you will see the following resources are created for K8s cluster.

Logical Switches for K8s
Tier1 Router for K8s
NSX LB for K8s


I have met a few issues during my journey. The following CLIs are used a lot when I troubleshoot. I shared these CLI here and hope can help you a bit as well.

  • How to check kubelet service’s log
journalctl -xeu kubelet
  • How to check log for a specific pod
kubectl logs nsx-ncp-rr989 -n nsx-system

“nsx-ncp-rr989” is the name of pod and “nsx-system” is the namespace which we created for NCP.

  • How to check log for a specific container when there are more than 1 container in the pod
kubectl logs nsx-node-agent-n7n7g -c nsx-node-agent -n nsx-system

“nsx-node-agent-n7n7g” is the pod name and “nsx-node-agent” is the container name.

  • Show details of a specific pod
kubectl describe pod nsx-ncp-rr989 -n nsx-system

Failed to Start Libvirtd


OS: CentOS Linux release 7.5.1804 (Core)

Error Message:

# journalctl -u libvirtd
— Logs begin at Wed 2019-01-30 17:46:41 AEDT, end at Wed 2019-01-30 18:02:09 AEDT. —
Jan 30 17:47:09 ovs-sandbox2 systemd[1]: Starting Virtualization daemon…
Jan 30 17:47:14 ovs-sandbox2 libvirtd[1483]: 2019-01-30 06:47:14.936+0000: 1483: info : libvirt version: 4.5.0, package: 10.el7_6.3 (CentOS BuildSystem, 2018-11-28-20:51:39,
Jan 30 17:47:14 ovs-sandbox2 libvirtd[1483]: 2019-01-30 06:47:14.936+0000: 1483: info : hostname: ovs-sandbox2
Jan 30 17:47:14 ovs-sandbox2 libvirtd[1483]: 2019-01-30 06:47:14.936+0000: 1483: error : virModuleLoadFile:53 : internal error: Failed to load module ‘/usr/lib64/libvirt/storage-backend/’: /usr/lib64/libvir
Jan 30 17:47:14 ovs-sandbox2 systemd[1]: libvirtd.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Jan 30 17:47:14 ovs-sandbox2 systemd[1]: Failed to start Virtualization daemon.
Jan 30 17:47:14 ovs-sandbox2 systemd[1]: Unit libvirtd.service entered failed state.
Jan 30 17:47:14 ovs-sandbox2 systemd[1]: libvirtd.service failed.
Jan 30 17:47:15 ovs-sandbox2 systemd[1]: libvirtd.service holdoff time over, scheduling restart.


The issue happened when I incidentally updated the libvirtd from 3.9.0-14.el7_5.8.x86_64 to 4.5.0-10.el7_6.3.x86_64


[root@ovs-sandbox2 /]# yum update librados2

[root@ovs-sandbox2 virtualmachines]

# yum history info 14
Loaded plugins: fastestmirror
Transaction ID : 14
Begin time : Wed Jan 30 18:10:53 2019
Begin rpmdb : 815:0a1f6c4d93558a35ec9c3ceb9114712149f71015
End time : 18:10:54 2019 (1 seconds)
End rpmdb : 817:358974b7c1ae161fe8d05d2d23573b31eaac6582
User : root
Return-Code : Success
Command Line : update librados2
Transaction performed with:
Installed rpm-4.11.3-32.el7.x86_64 @anaconda
Installed yum-3.4.3-158.el7.centos.noarch @anaconda
Installed yum-plugin-fastestmirror-1.1.31-45.el7.noarch @anaconda
Packages Altered:
Dep-Install boost-iostreams-1.53.0-27.el7.x86_64 @base
Dep-Install boost-random-1.53.0-27.el7.x86_64 @base
Updated librados2-1:0.94.5-2.el7.x86_64 @base
Update 1:10.2.5-4.el7.x86_64 @base
Updated librbd1-1:0.94.5-2.el7.x86_64 @base
Update 1:10.2.5-4.el7.x86_64 @base
history info

[root@ovs-sandbox2 virtualmachines]


Automate NSX-T Build with Terraform

Terraform is a widely adopted Infrastructure as Code tool that allow you to define your infrastructure using a simple, declarative programming language, and to deploy and manage infrastructure across public cloud providers including AWS, Azure, Google Cloud & IBM Cloud and other infrastructure providers like VMware NSX-T, F5 Big-IP etc.

In this blog, I will show you how to leverage Terraform NSX-T provider to define a NSX-T tenant environment in minutes.

To build the new NSX-T environment, I am going to:

  1. Create a new Tier1 router named tier1_router;
  2. Create three logical switches under newly created Tier1 router for web/app/db security zone;
  3. Connect the newly created Tier1 router to the existing Tier0 router;
  4. Create a new network service group including SSH and HTTPs;
  5. Create a new firewall section and add a firewall rule to allow outbound SSH/HTTPs traffic from any workload in web logical switch to any workload in app logical switch;

Firstly, I define a Terraform module as below. Note: Terraform module is normally used to define reusable components. For example, the module which I defined here can be re-used to complete non-prod and prod environment build when you provide different input.

provider "nsxt" {
  allow_unverified_ssl = true
  max_retries = 10
  retry_min_delay = 500
  retry_max_delay = 5000
  retry_on_status_codes = [429]

data "nsxt_transport_zone" "overlay_transport_zone" {
  display_name = "tz-overlay"

data "nsxt_logical_tier0_router" "tier0_router" {
  display_name = "t0"

data "nsxt_edge_cluster" "edge_cluster" {
  display_name = "edge-cluster"

resource "nsxt_logical_router_link_port_on_tier0" "tier0_port_to_tier1" {
  description = "TIER0_PORT1 provisioned by Terraform"
  display_name = "tier0_port_to_tier1"
  logical_router_id = "${}"
  tag {
    scope = "ibm"
    tag   = "blue"

resource "nsxt_logical_tier1_router" "tier1_router" {
  description = "RTR1 provisioned by Terraform"
  display_name = "${var.nsxt_logical_tier1_router_name}"
  #failover_mode = "PREEMPTIVE"
  edge_cluster_id = "${}"
  enable_router_advertisement = true
  advertise_connected_routes = false
  advertise_static_routes = true
  advertise_nat_routes = true
  tag {
    scope = "ibm"
    tag   = "blue"

resource "nsxt_logical_router_link_port_on_tier1" "tier1_port_to_tier0" {
  description  = "TIER1_PORT1 provisioned by Terraform"
  display_name = "tier1_port_to_tier0"
  logical_router_id = "${}"
  linked_logical_router_port_id = "${}"
  tag {
    scope = "ibm"
    tag   = "blue"

resource "nsxt_logical_switch" "LS-terraform-web" {
  admin_state = "UP"
  description = "LogicalSwitch provisioned by Terraform"
  display_name = "${var.logicalswitch1_name}"
  transport_zone_id = "${}"
  replication_mode  = "MTEP"
  tag {
    scope = "ibm"
    tag = "blue"

resource "nsxt_logical_switch" "LS-terraform-app" {
  admin_state = "UP"
  description = "LogicalSwitch provisioned by Terraform"
  display_name = "${var.logicalswitch2_name}"
  transport_zone_id = "${}"
  replication_mode  = "MTEP"
  tag {
    scope = "ibm"
    tag = "blue"

resource "nsxt_logical_switch" "LS-terraform-db" {
  admin_state = "UP"
  description = "LogicalSwitch provisioned by Terraform"
  display_name = "${var.logicalswitch3_name}"
  transport_zone_id = "${}"
  replication_mode  = "MTEP"
  tag {
    scope = "ibm"
    tag = "blue"

resource "nsxt_logical_port" "lp-terraform-web" {
  admin_state = "UP"
  description = "lp provisioned by Terraform"
  display_name = "lp-terraform-web"
  logical_switch_id = "${}"

  tag {
    scope = "ibm"
    tag   = "blue"

resource "nsxt_logical_port" "lp-terraform-app" {
  admin_state = "UP"
  description = "lp provisioned by Terraform"
  display_name = "lp-terraform-app"
  logical_switch_id = "${}"

  tag {
    scope = "ibm"
    tag   = "blue"

resource "nsxt_logical_port" "lp-terraform-db" {
  admin_state = "UP"
  description = "lp provisioned by Terraform"
  display_name = "lp-terraform-db"
  logical_switch_id = "${}"

  tag {
    scope = "ibm"
    tag   = "blue"

resource "nsxt_logical_router_downlink_port" "lif-terraform-web" {
  description = "lif provisioned by Terraform"
  display_name = "lif-terraform-web"
  logical_router_id = "${}"
  linked_logical_switch_port_id = "${}"
  ip_address = "${var.logicalswitch1_gw}"

  tag {
    scope = "ibm"
    tag   = "blue"

resource "nsxt_logical_router_downlink_port" "lif-terraform-app" {
  description = "lif provisioned by Terraform"
  display_name = "lif-terraform-app"
  logical_router_id = "${}"
  linked_logical_switch_port_id = "${}"
  ip_address = "${var.logicalswitch2_gw}"

  tag {
    scope = "ibm"
    tag   = "blue"

resource "nsxt_logical_router_downlink_port" "lif-terraform-db" {
  description = "lif provisioned by Terraform"
  display_name = "lif-terraform-db"
  logical_router_id = "${}"
  linked_logical_switch_port_id = "${}"
  ip_address = "${var.logicalswitch3_gw}"

  tag {
    scope = "ibm"
    tag   = "blue"

resource "nsxt_l4_port_set_ns_service" "ns_service_tcp_443_22_l4" {
  description = "Service provisioned by Terraform"
  display_name = "web_to_app"
  protocol = "TCP"
  destination_ports = ["443", "22"]
  tag {
    scope = "ibm"
    tag   = "blue"

resource "nsxt_firewall_section" "terraform" {
  description = "FS provisioned by Terraform"
  display_name = "Web-App"
  tag {
    scope = "ibm"
    tag = "blue"
  applied_to {
    target_type = "LogicalSwitch"
    target_id = "${}"

  section_type = "LAYER3"
  stateful = true

  rule {
    display_name = "out_rule"
    description  = "Out going rule"
    action = "ALLOW"
    logged = true
    ip_protocol = "IPV4"
    direction = "OUT"

    source {
      target_type = "LogicalSwitch"
      target_id = "${}"

    destination {
      target_type = "LogicalSwitch"
      target_id = "${}"
    service {
      target_type = "NSService"
      target_id = "${}"
    applied_to {
      target_type = "LogicalSwitch"
      target_id = "${}"

output "edge-cluster-id" {
  value = "${}"

output "edge-cluster-deployment_type" {
  value = "${data.nsxt_edge_cluster.edge_cluster.deployment_type}"

output "tier0-router-port-id" {
  value = "${}"

Then I use the below to call this newly created module:

provider "nsxt" {
  allow_unverified_ssl = true
  max_retries = 10
  retry_min_delay = 500
  retry_max_delay = 5000
  retry_on_status_codes = [429]

module "nsxtbuild" {
  source = "/root/terraform/modules/nsxtbuild"
  nsxt_logical_tier1_router_name = "tier1-npr-vr"
  logicalswitch1_name = "npr-web"
  logicalswitch2_name = "npr-app"
  logicalswitch3_name = "npr-db"
  logicalswitch1_gw = ""
  logicalswitch2_gw = ""
  logicalswitch3_gw = ""

After “terraform apply”, you can find the required environment is built successfully in NSX Manager.

Logical Switches
T1 vRouter
DFW Rules