Setting Up Federated Identity Management for VMC on AWS – Authentication with Okta IdP

The Federated Identity feature of VMware Cloud on AWS can be integraed with all 3rd party IdPs who support SAML version 2.0. In this integration model, the customer dedicated vIDM tenant will work as SAML Service Provider. If the 3rd party IdP is set up to perform multi-factor authentication (MFA), the customer will be prompted MFA for access to VMware Cloud services. In this blog, the integration with one of the most popular IdP Okta will be demoed.

Disclaimer:

(1) This blog is my personal blog, which doesn’t represent my employer.(2) The Okta IdP setting in this blog is to demo the integration for vIDM, which may not be the best practise for your environment or meet your business and security requirement.

Note: please complete the first part of Intergation as my first blog (https://wordpress.com/block-editor/post/davidwzhang.com/3080) of this series before moving forward.

To add the same users and user groups in Okta IdP as the configured vIDM tenant, we need to integrate Okta with corporate Active Directory (AD). The integration is via Okta’s lightweight agent.

Click the “Directory Integration” in Okta UI.

Click “Add Active Directory”.

The Active Directory integration setup wizard will start and click “Set Up Active Directory”.

Download the agent as required in the below window.

This agent can be installed on a Windows Server 2008 R2 or later. The installation of this Okta agent is quite straightforward. Once the agent installation is completed, you need to perform the setup of this AD integration. In the basic setting window, select the Organizational Units (OUs) that you’d like to sync users or groups from and make sure that “Okta username format” is set up to use User Principle Name (UPN).

In the “Build User Profile” window, select any custom schema which needs to be included in the Okta user profile and click Next.

Click Done to finish the integration setup.

The Okta directory setting window will pop up.

Enable the Just-In-Time provisioning and set the Schedule Import to perform user import every hour. Review and save the setting.

Now go to the Import tab and click “Import Now” to import the users from corporate AD.

As it is the first time to import user/users from customer AD, select “Full Import” and click Import.

When the scan is finished, Okta will report the result. Click OK.

Select the user/users to be imported and confirm the user assignment. Note: the user jsmith@lab.local is imported here, who will be used for the final integration testing.

Now it is time to set up the SAML IdP in Okta.

Go to Okta Classic UI application tab and click “Add Application”

Click “Create New App”;

Select Web as the Platform and “SAML 2.0” for Sign on method and click Create;

Type in App name, “csp-vidm” is used as an example as the app name and click Next;

There are two configuration items in the popped up “Create SAML Integration” window which is mandatory. These information can be copied from Identity Provider setting within vIDM tenant.

Go to vIDM tenant administrator console and click “Add Identity Provider” and select “Create Third Party IDP” within the “Identity & Access Management” tab.

Type in the “Identity Provider Name”, here the example name is “Okta01”

Go to the bottom of this IdP creation window and click “Service Provider (SP) Metadata”.

A new window will pop up as the below:

The entity ID and HTTP-POST location are required information for Okta IdP SAML setting. Copy the entity ID URL link into the “Audience URI (SP Entity ID) and HTTP-POST location into “Single sign on URL” in the Okta “Create SAML Integration” window.

Leave all other configuration items as the default and click Next;

In the Feedback window, suggest the newly created app is an internal app and click Finish.

A “Sign On settings” window will pop up as below, click “Identity Provider metadata” link.

The XML file format of Identity Provider metadata shows up. Select all content of this XML file and copy.

Paste the Okta IdP metadata into SAML Metadata and click “Process IdP Metadata” in the vIDM 3rd party identity provider creation window.

The “SAML AuthN Request Binding” and “Name ID format mapping from SAML Response” will be updated as below:

Select “lab.local” directory as users who can authenticate with this new 3rd party IdP and leave the Network as default “All RANGES”. Then create a new authentication method called “Okta Auth” with SAML Context “urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtected“. Please note that the name of this newly created authentication method has to be different from any existing authentication method.

Then leave all other configuration items’ box unchecked and click Add.

The 3rd party IdP has been successfully added now.

The last step of vIDM set up for this Okta integration is updating the default access policy to use the newly defined authentication method “Okta Auth”. Please follow up the steps in my previous blog (https://wordpress.com/block-editor/post/davidwzhang.com/308) to perform the required update. The updated default access policy should be similar as below.

Before going to test the setup, go to Okta UI to assign user/s to the newly defined SAML 2.0 web application “csp-vidm”. Click Assignment.

Click Assign and select “Assign to People”.

In the “Assign csp-vidm to People” window, assign user John Smith (jsmith@lab.local), which means that the user John Smith is allowed by this SAML 2.0 application.

After the assignment is completed, user John Smith is under the assignment of this SAML 2.0 application “csp-vidm”.

Instead of assigning individual users, AD group/groups can be assigned to the SAML application as well.

Finally, everything is ready to test the integration.

Open a new Incognito window in a Chrome browser and type in the vIDM tenant URL then click Enter.

In the log in window, type user name jsmith@lab.local and click Next.

The authentication session is redirected to Okta.

Type in Username & Password and click “Sign In”.

Then John Smith (jsmith@lab.local) successfully logs in the vIDM tenant.

This is the end of this demo. Thank you very much for reading!

Setting Up Federated Identity Management for VMC on AWS – Authentication with Active Directory

VMware Cloud on AWS Federated Identity management supports different kinds of authentication methods. This blog will demo the basic method: authentication with the customer corporate Active Directory (AD).

When VMC on AWS customers use AD for authentication, outbound-only connection mode is highly recommended. This mode does not require any inbound firewall port to be opened: only outbound connectivity from vIDM Connector to VMware SaaS vIDM tenant on port 443 is required. All user and group sync from your enterprise directory and user authentication are handled by the vIDM connector.

To enable outbound-only mode, go to update the settings of the Build-in Identity Provider. In the user section of Built-in Identity Provider settings, select the newly created directory “lab.local” and add the newly created connector “vidmcon01.​lab.​local”.

After the connetor is added successfully, select Password (cloud deployment) in the “Connector Authentication Methods” and click Save.

Now it is time to update the access policy to use corporate Active Directory to authenticate VMC users.

Go to Identity & Access Management.

Click “Edit DEFAULT POLICY” then the “Edit Policy” window pop up. Click Next.

Click “ADD POLICY RULE”.

Then the “Add Policy Rule” window will pop up. At this stage, just leave the first two configuration items as default: “ALL RANGES” and “ALL Device Types”. In the “and user belong to group(s)” config item, search and add all 3 synced groups (sddc-admins, sddc-operators and sddc-readonly) to allow the users in these 3 groups to log in.

Add Password(cloud deployment) as authentication method.

Use Password(Local Directory) as fallback authentication method and click Save.

There are 3 rules defined in the default access policy. Drag the newly defined rule to the top of the rules table, which will make sure that the new rule is evaluated first when a user tries to log in.

Now the rules table shows as below. Click Next.

Click Save to keep the changes of the default access policy.

You are now good to test your authentication set up. Open a new Incognito window in your Chrome browser and connect to the vIDM URL. Type in the username (jsmith@lab.local) and click Next.

Type in the Active Directory password for user jsmith@lab.local and click “Sign in”.

Then you can see that jsmith@lab.local has successfully logged in the vIDM!

Thank you very much for reading!

Setting Up Federated Identity Management for VMC on AWS – Install and Setup vIDM Connector

As an enterprise using VMware Cloud Services, you can set up federation with your corporate domain. Federating your corporate domain allows you to use your organization’s single sign-on and identity source to sign in to VMware Cloud Services. You can also set up multi-factor authentication as part of federation access policy settings.

Federated identity management allows you to control authentication to your organization and its services by assigning organization and service roles to your enterprise groups.

Set up a federated identity with the VMware Identity Manager service and the VMware Identity Manager connector, which VMWare provide at no additional charge.

  1. Download the VMware Identity Manager (vIDM) connector and configure it for user attributes and group sync from your corporate identity store. Note that only the VMware Identity Manager Connector for Windows is supported.
  2. Configure your corporate identity provider instance using the VMware Identity Manager service.
  3. Register your corporate domain.

I am going to create a series of blogs to cover all of 3 steps.

As the 1st blog of this series, I will show you how to install the vIDM connector (version 19.03) on Windows 2012 R2 server and how we achieve the HA for vIDM connector.

Prerequisite

  • a vIDM SaaS tenant. If you don’t have one, please contact VMware customer success representative.
  • a Window Server (Windows 2008 R2, Windows 2012, Windows 2012 R2 or Windows 2016).
  • Open the firewall rules for communication from Windows Server to domain controllers and vIDM tenant on port 443.
  • vIDM connector for Windows installation package. The latest version of vIDM connector is shown below.

Installation

Log in to the Windows 2012 R2 server and start the installation:

Click Yes in the “User Account Control” window.

Note the installation package will install the latest major JRE version on on the connector windows server if the JRE has not been installed yet.

The installation process is loading the Installation Wizard.

Click Next in the Installation Wizard window.

Accept the License Agreement as below:

Accept the default of installation destination folder and click Next;

Click Next and leave the “Are you migrating your Connector” box unchecked.

Accept the pop-up hostname and default port for this connector.

As the purpose of VMware Cloud federated identity management, please don’t run the Connector service as domain user account. So leave this “Would you like to run the Connector service as a domain user account?” option box unchecked and click Next.

Click Yes in the pop-up window to confirm from the previous step.

Click Install to begin the installation.

Wait for a few minutes, the installation has completed successfully.

Click Finish. A new window will pop up, which suggests the Connector appliance management URL as below .

Click Yes. The browser is opened and will redirect to https://vidmconn01.lab.local:8443. Accept the alert of security certificate and continue to this website.

In the VMware Identity Manager Appliance Setup wizard, click Continue.

Set passwords for appliance application admin account and click Continue.

Now go to the vIDM tenant, in the tab of Identity & Access Management, click Add Connector.

Type in Connector ID Name and Click “Generate Activation Code”.

Copy the generated activation code and go back to the Connector setup wizard.

Copy the activation code into the Activate Connector Window and click Continue.

Wait for a few minutes then the connector will be activated.

Note: sometimes a 404 error will pop up like the below. As my experience, it is a false alert for Windows 2012 R2. Don’t worry about it.

In VMware Identity Manager tenant, the newly installed connector will show up as below:

Setup

Now it is time to set up our connector for user sync.

Step 1: Add Directory

Click Add Directory and select “Add Active Directory over LDAP/IWA”.

Type in “Directory Name”, select “Active Directory over LDAP” and use this directory for user sync and authentication. In the “Directory Search Attribute”, I prefer to use UserPrincipalName than sAMAccountName as the UserPrincipalName option will work for all Federated Identity management use cases, e.g. integration with Active Directory Federation Service and 3rd Party IDP.

Then provide the required Bind User Details and click “Save & Next”

After a few minutes, the domain will pop up. Click Next.

In the Map User Attributes window, accept the setup and click Next

Type in the group DNs and click “Find Groups”.

Click the “0 of 23” under the column “Groups to sync”.

Select 3 user groups which need to be synced and click Save.

Click Next.

Accept the default setting in the “Select the Users you would like to sync” window and click Next.

In the Review window, click “Sync Directory”

Now it is time to verify that the synced users and groups in VIDM tenant. Go to the “User & Groups” tab. You can see we have 10 users and 3 groups that are synced from lab.local directory.

You can find the sync log within the configured directory.

Now the basic set up of vIDM connector has been completed.

Connector HA

A single VMware Identity manager is considered as a single point of failure in an enterprise environment. To achieve the high availability of connectors, just install an extra one or multiple connectors, the installation of an extra connector is exactly same as installing the 1st connector. Here, the second connector is installed on another Windows 2012 R2 server vidmcon02.lab.local. After the installation is completed, the activation procedure of the connector is the same as well.

Now 2 connectors will show up in the vIDM tenant.

Go to the Built-in identity provider and add the second connector.

Type in the Bind User Password and click “Add Connector”

Then the second connector is added successfully.

Now there are 2 connectors associated with the Built-in Identity Provider.

Please note connector HA is only for user authentication in version 19.03. Directory or user sync can only be enabled on one connector at a time. In the event of a connector instance failure, authentication is handled automatically by another connector instance. However, for directory sync, you must modify the directory settings in the VMware Identity Manager service to use another connector instance like the below.

Thank you very much for reading!

Integrate VMware NSX-T with Kubernetes

Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. K8s use network plugin to provide the required networking functions like routing, switching, firewall and load balancing. VMware NSX-T provides a network plugin called NCP for K8s as well. If you want to know more about VMware NSX-T, please go to docs.vmware.com.

In this blog, I will show you how to integrate VMWare NSX-T with Kubernetes.

Here, we will build a three nodes single master K8s cluster. All 3 nodes are RHEL 7.5 virtual machine.

  • master node:
    • Hostname: master.k8s
    • Mgmt IP: 10.1.73.233
  • worker node1:
    • Hostname: node1.k8s
    • Mgmt IP: 10.1.73.234
  • worker node2:
    • Hostname: node2.k8s
    • Mgmt IP: 10.1.73.235

On each node, there are 2 vNICs attached. The first vNIC is ens192 which is for management and the second vNIC is ens224, which is for K8s transport and connected to an overlay logical switch.

NSX-T version: 2.3.0.0.0.10085405;

NSX-T NCP version: 2.3.1.10693410

Docker version: 18.03.1-ce;

K8s version: 1.11.4

1. Prepare K8s Cluster Setup

1.1 Get Offline Packages and Docker Images

As there is no Internet access in my environment, I have to prepare my K8s cluster offline. To do that, I need to get the following packages:

  • Docker offline installation packages
  • Kubeadm offline installation packages which will be used to set up the K8s cluster;
  • Docker offline images;

1.1.1 Docker Offline Installation Packages

Regarding how to get Docker offline installation packages, please refer to my other blog: Install Docker Offline on Centos7.

1.1.2 Kubeadm Offline Installation Packages

Getting Kubeadm offline installation packages is quite straightforward as well. You can use Yum with downloadonly option.

yum install --downloadonly --downloaddir=/root/ kubelet-1.11.0
yum install --downloadonly --downloaddir=/root/ kubeadm-1.11.0
yum install --downloadonly --downloaddir=/root/ kubectl-1.11.0

1.1.3 Docker Offline Images

Below are the required Docker images for K8s cluster.

  • k8s.gcr.io/kube-proxy-amd64 v1.11.4
  • k8s.gcr.io/kube-apiserver-amd64 v1.11.4
  • k8s.gcr.io/kube-controller-manager-amd64 v1.11.4
  • k8s.gcr.io/kube-scheduler-amd64 v1.11.4
  • k8s.gcr.io/coredns 1.1.3
  • k8s.gcr.io/etcd-amd64 3.2.18
  • k8s.gcr.io/pause-amd64 3.1
  • k8s.gcr.io/pause 3.1

You possibly notice that the above includes two
identical pause images although these two have different repository names. There is a story around this. Initially, I only got the first image
“k8s.gcr.io/pause-amd64” loaded. The setup passed through “kubeadm init” pre-flight but failed at the real cluster setup stage. When I checked the log, I found out that the cluster set up process kept requesting the second image. I guess it is a bug with kubeadm v1.11.0 which I am using.

I put an example here to show how to use “docker pull” CLI to download a docker image in case you don’t know how to do it.

docker pull k8s.gcr.io/kube-proxy-amd64:v1.11.4

Once you have all Docker images, you need to export these Docker images as offline images via “docker save”.

docker save k8s.gcr.io/pause-amd64:3.1 -o /pause-amd64:3.1.docker

Now it is time to upload all your installation packages and offline images to all your K8s 3 nodes including master node.

1.2 Disable SELinux and Firewalld

# disable SELinux
setenforce 0
# Change SELINUX to permissive for /etc/selinux/config
vi /etc/selinux/config
SELINUX=permissive
# Stop and disable firewalld
systemctl disable firewalld && systemctl stop firewalld

1.3 Config DNS Resolution

# Update the /etc/hosts file as below on all three K8s nodes
10.1.73.233   master.k8s
10.1.73.234   node1.k8s
10.1.73.235   node2.k8s

1.4 Install Docker and Kubeadm

To install Docker and Kubeadm, first you put all required packages for Docker or kubeadm into a different directory. For example, all required packages for kubeadm are put into a directory called kubeadm. Then use rpm to install kubeadm as below:

[root@master kubeadm]# rpm -ivh --replacefiles --replacepkgs *.rpm
warning: 53edc739a0e51a4c17794de26b13ee5df939bd3161b37f503fe2af8980b41a89-cri-tools-1.12.0-0.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID 3e1ba8d5: NOKEY
warning: socat-1.7.3.2-2.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY
Preparing...                          ########################## [100%]
Updating / installing...
   1:socat-1.7.3.2-2.el7              ########################## [ 17%]
   2:kubernetes-cni-0.6.0-0           ########################## [ 33%]
   3:kubelet-1.11.0-0                 ########################## [ 50%]
   4:kubectl-1.11.0-0                 ########################## [ 67%]
   5:cri-tools-1.12.0-0               ########################## [ 83%]
   6:kubeadm-1.11.0-0                 #########################3 [100%]

After Docker and Kubeadm are installed, you can go to enable and start docker and kubelet service:

systemctl enable docker && systemctl start docker
systemctl enable kubelet && systemctl start kubelet

In addition, you need to perform some OS level setup so that your K8s environment can run properly.

# ENABLING THE NET.BRIDGE.BRIDGE-NF-CALL-IPTABLES KERNEL OPTION
sysctl -w net.bridge.bridge-nf-call-iptables=1
echo "net.bridge.bridge-nf-call-iptables=1" > /etc/sysctl.d/k8s.conf
# Disable Swap
swapoff -a && sed -i '/ swap / s/^/#/' /etc/fstab

1.5 Load Docker Offline Images

Now let us load all offline docker images into your local Docker repo on all K8s node via CLI “docker load”.

docker load -i kube-apiserver-amd64:v1.11.4.docker
docker load -i coredns:1.1.3.docker
docker load -i etcd-amd64:3.2.18.docker
docker load -i kube-apiserver-amd64:v1.11.4.docker
docker load -i kube-controller-manager-amd64:v1.11.4.docker
docker load -i kube-proxy-amd64:v1.11.4.docker
docker load -i kube-scheduler-amd64:v1.11.4.docker
docker load -i pause-amd64:3.1.docker
docker load -i pause:3.1.docker

1.6 NSX NCP Plugin

Now you can upload your NSX NCP plugin to all 3 nodes and load the NCP images into local Docker repo.

1.6.1 Load NSX Container Image

docker load -i nsx-ncp-rhel-2.3.1.10693410.tar 

Now the docker image list on your K8s nodes will be similar to below:

[root@master ~]# docker image list
REPOSITORY                                   TAG                 IMAGE ID            CREATED             SIZE
registry.local/2.3.1.10693410/nsx-ncp-rhel   latest              97d54d5c80db        5 months ago        701MB
k8s.gcr.io/kube-proxy-amd64                  v1.11.4             5071d096cfcd        5 months ago        98.2MB
k8s.gcr.io/kube-apiserver-amd64              v1.11.4             de6de495c1f4        5 months ago        187MB
k8s.gcr.io/kube-controller-manager-amd64     v1.11.4             dc1d57df5ac0        5 months ago        155MB
k8s.gcr.io/kube-scheduler-amd64              v1.11.4             569cb58b9c03        5 months ago        56.8MB
k8s.gcr.io/coredns                           1.1.3               b3b94275d97c        11 months ago       45.6MB
k8s.gcr.io/etcd-amd64                        3.2.18              b8df3b177be2        12 months ago       219MB
k8s.gcr.io/pause-amd64                       3.1                 da86e6ba6ca1        16 months ago       742kB
k8s.gcr.io/pause                             3.1                 da86e6ba6ca1        16 months ago       742kB

1.6.2 Install NSX CNI

rpm -ivh --replacefiles nsx-cni-2.3.1.10693410-1.x86_64.rpm

Please note replacefiles option is required as a known bug with NSX-T 2.3. If you don’t include the replacefiles option, you will see an error like below:

[root@master rhel_x86_64]# rpm -i nsx-cni-2.3.1.10693410-1.x86_64.rpm
   file /opt/cni/bin/loopback from install of nsx-cni-2.3.1.10693410-1.x86_64 conflicts with file from package kubernetes-cni-0.6.0-0.x86_64

1.6.3 Install and Config OVS

# Go to OpenvSwitch directory
rpm -ivh openvswitch-2.9.1.9968033.rhel75-1.x86_64.rpm
systemctl start openvswitch.service && systemctl enable openvswitch.service
ovs-vsctl add-br br-int
ovs-vsctl add-port br-int ens224 -- set Interface ens224 ofport_request=1
ip link set br-int up
ip link set ens224 up

2. Setup K8s Cluster

Now you are ready to set up your K8s cluster. I will use kubeadm config file to define my K8s cluster when I initiate the K8s cluster setup. Below is the content of my kubeadm config file.

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.11.4
api:
  advertiseAddress: 10.1.73.233
  bindPort: 6443

From the above, you can see that Kubernetes version v1.11.4 will be used and the API server IP is 10.1.73.233, which is the master node IP. Run the following CLI from K8s master node to create the K8s cluster.

kubeadm init --config kubeadm.yml

After the K8s cluster is set up, you can join the resting two worker nodes into the cluster via CLI below:

kubeadm join 10.1.73.233:6443 --token up1nz9.iatqv50bkrqf0rcj --discovery-token-ca-cert-hash sha256:3f9e96e70a59f1979429435caa35d12270d60a7ca9f0a8436dff455e4b8ac1da

Note: You can get the required token and discovery-token-ca-cert-hash from the output of “kubeadm init”.

3. NSX-T and K8s Integration

3.1 Prepare NSX Resource

Before the integration, you have to make sure that you have NSX-T resources configured in NSX manager. The required resource includes:

  • Overlay Transport Zone: overlay_tz
  • Tier 0 router: tier0_router
  • K8s Transport Logical Switch
  • IP Blocks for Kubernetes Pods: container_ip_blocks
  • IP Pool for SNAT: external_ip_pools
  • Firewall Marker Sections: top_firewall_section_marker and bottom_firewall_section_marker

Please refer the NSX Container Plug-in for Kubernetes and Cloud Foundry – Installation and Administration Guide to further check how to create the NSX-T resource. The following are the UUID for all created resources:

  • tier0_router = c86a625e-54e0-4510-9185-e9e1b7e26eb9
  • overlay_tz = f6d90300-c56e-4d26-8684-8eff64cdf5a0
  • container_ip_blocks = f9e411f5-654e-4f0d-99e8-2e5a9812f295
  • external_ip_pools = 84ffd635-640f-41c6-be85-71337e112e69
  • top_firewall_section_marker = ab07e559-79aa-4bc9-a6f0-126ea59278c2
  • bottom_firewall_section_marker = 35aaa6c5-0870-4ac4-bf47-114780863956

In addition, make sure that you tagged switching ports which three k8s nodes are attached to in the following ways:

{'ncp/node_name': '<node_name>'}
{'ncp/cluster': '<cluster_name>'}

node_name is the FQDN hostname of the K8s node and the cluster_name is what you call this cluster in NSX not in K8s cluster context. I show you here my K8s nodes’ tags.

k8s master switching port tags
k8s node1 swicthing port tags

k8s node2 swicthing port tags

3.2 Install NSX NCP Plugin

3.2.1 Create Name Space

kubectl create ns nsx-system

3.2.2 Create Service Account for NCP

kubectl apply -f rbac-ncp.yml -n nsx-system

3.2.3 Create NCP ReplicationController

kubectl apply -f ncp-rc.yml -n nsx-system

3.2.4 Create NCP nsx-node-agent and nsx-kube-proxy DaemonSet

kubectl create -f nsx-node-agent-ds.yml -n nsx-system 

You can find the above 3 yaml files in Github
https://github.com/insidepacket/nsxt-k8s-integration-yaml

Now you have completed the NSX-T and K8s integration. If you check the pods running on your K8s cluster, you will see the similar as below:

[root@master ~]# k get pods --all-namespaces 
NAMESPACE     NAME                                   READY     STATUS    RESTARTS   AGE
kube-system   coredns-78fcdf6894-pg4dz               1/1       Running   0          9d
kube-system   coredns-78fcdf6894-q727q               1/1       Running   128        9d
kube-system   etcd-master.k8s                        1/1       Running   3          14d
kube-system   kube-apiserver-master.k8s              1/1       Running   2          14d
kube-system   kube-controller-manager-master.k8s     1/1       Running   3          14d
kube-system   kube-proxy-5p482                       1/1       Running   2          14d
kube-system   kube-proxy-9mnwk                       1/1       Running   0          12d
kube-system   kube-proxy-wj8qw                       1/1       Running   3          14d
kube-system   kube-scheduler-master.k8s              1/1       Running   3          14d
ns-test1000   http-echo-deployment-b5bbfbb86-j4dxq   1/1       Running   0          2d
nsx-system    nsx-ncp-rr989                          1/1       Running   0          11d
nsx-system    nsx-node-agent-kbsld                   2/2       Running   0          9d
nsx-system    nsx-node-agent-pwhlp                   2/2       Running   0          9d
nsx-system    nsx-node-agent-vnd7m                   2/2       Running   0          9d
nszhang       busybox-756b4db447-2b9kx               1/1       Running   0          5d
nszhang       busybox-deployment-5c74f6dd48-n7tp2    1/1       Running   0          9d
nszhang       http-echo-deployment-b5bbfbb86-xnjz6   1/1       Running   0          2d
nszhang       jenkins-deployment-8546d898cd-zdzs2    1/1       Running   0          11d
nszhang       whoami-deployment-85b65d8757-6m7kt     1/1       Running   0          6d
nszhang       whoami-deployment-85b65d8757-b4m99     1/1       Running   0          6d
nszhang       whoami-deployment-85b65d8757-pwwt9     1/1       Running   0          6d

In NSX-T manager GUI, you will see the following resources are created for K8s cluster.

Logical Switches for K8s
Tier1 Router for K8s
NSX LB for K8s

Tips:

I have met a few issues during my journey. The following CLIs are used a lot when I troubleshoot. I shared these CLI here and hope can help you a bit as well.

  • How to check kubelet service’s log
journalctl -xeu kubelet
  • How to check log for a specific pod
kubectl logs nsx-ncp-rr989 -n nsx-system

“nsx-ncp-rr989” is the name of pod and “nsx-system” is the namespace which we created for NCP.

  • How to check log for a specific container when there are more than 1 container in the pod
kubectl logs nsx-node-agent-n7n7g -c nsx-node-agent -n nsx-system

“nsx-node-agent-n7n7g” is the pod name and “nsx-node-agent” is the container name.

  • Show details of a specific pod
kubectl describe pod nsx-ncp-rr989 -n nsx-system

Failed to Start Libvirtd

Environment:

OS: CentOS Linux release 7.5.1804 (Core)

Error Message:

# journalctl -u libvirtd
— Logs begin at Wed 2019-01-30 17:46:41 AEDT, end at Wed 2019-01-30 18:02:09 AEDT. —
Jan 30 17:47:09 ovs-sandbox2 systemd[1]: Starting Virtualization daemon…
Jan 30 17:47:14 ovs-sandbox2 libvirtd[1483]: 2019-01-30 06:47:14.936+0000: 1483: info : libvirt version: 4.5.0, package: 10.el7_6.3 (CentOS BuildSystem http://bugs.centos.org, 2018-11-28-20:51:39, x86-01.bsys.centos.org)
Jan 30 17:47:14 ovs-sandbox2 libvirtd[1483]: 2019-01-30 06:47:14.936+0000: 1483: info : hostname: ovs-sandbox2
Jan 30 17:47:14 ovs-sandbox2 libvirtd[1483]: 2019-01-30 06:47:14.936+0000: 1483: error : virModuleLoadFile:53 : internal error: Failed to load module ‘/usr/lib64/libvirt/storage-backend/libvirt_storage_backend_rbd.so’: /usr/lib64/libvir
Jan 30 17:47:14 ovs-sandbox2 systemd[1]: libvirtd.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Jan 30 17:47:14 ovs-sandbox2 systemd[1]: Failed to start Virtualization daemon.
Jan 30 17:47:14 ovs-sandbox2 systemd[1]: Unit libvirtd.service entered failed state.
Jan 30 17:47:14 ovs-sandbox2 systemd[1]: libvirtd.service failed.
Jan 30 17:47:15 ovs-sandbox2 systemd[1]: libvirtd.service holdoff time over, scheduling restart.

When:

The issue happened when I incidentally updated the libvirtd from 3.9.0-14.el7_5.8.x86_64 to 4.5.0-10.el7_6.3.x86_64

Fix:

[root@ovs-sandbox2 /]# yum update librados2

[root@ovs-sandbox2 virtualmachines]

# yum history info 14
Loaded plugins: fastestmirror
Transaction ID : 14
Begin time : Wed Jan 30 18:10:53 2019
Begin rpmdb : 815:0a1f6c4d93558a35ec9c3ceb9114712149f71015
End time : 18:10:54 2019 (1 seconds)
End rpmdb : 817:358974b7c1ae161fe8d05d2d23573b31eaac6582
User : root
Return-Code : Success
Command Line : update librados2
Transaction performed with:
Installed rpm-4.11.3-32.el7.x86_64 @anaconda
Installed yum-3.4.3-158.el7.centos.noarch @anaconda
Installed yum-plugin-fastestmirror-1.1.31-45.el7.noarch @anaconda
Packages Altered:
Dep-Install boost-iostreams-1.53.0-27.el7.x86_64 @base
Dep-Install boost-random-1.53.0-27.el7.x86_64 @base
Updated librados2-1:0.94.5-2.el7.x86_64 @base
Update 1:10.2.5-4.el7.x86_64 @base
Updated librbd1-1:0.94.5-2.el7.x86_64 @base
Update 1:10.2.5-4.el7.x86_64 @base
history info

[root@ovs-sandbox2 virtualmachines]

#

Automate NSX-T Build with Terraform

Terraform is a widely adopted Infrastructure as Code tool that allow you to define your infrastructure using a simple, declarative programming language, and to deploy and manage infrastructure across public cloud providers including AWS, Azure, Google Cloud & IBM Cloud and other infrastructure providers like VMware NSX-T, F5 Big-IP etc.

In this blog, I will show you how to leverage Terraform NSX-T provider to define a NSX-T tenant environment in minutes.

To build the new NSX-T environment, I am going to:

  1. Create a new Tier1 router named tier1_router;
  2. Create three logical switches under newly created Tier1 router for web/app/db security zone;
  3. Connect the newly created Tier1 router to the existing Tier0 router;
  4. Create a new network service group including SSH and HTTPs;
  5. Create a new firewall section and add a firewall rule to allow outbound SSH/HTTPs traffic from any workload in web logical switch to any workload in app logical switch;

Firstly, I define a Terraform module as below. Note: Terraform module is normally used to define reusable components. For example, the module which I defined here can be re-used to complete non-prod and prod environment build when you provide different input.

/*
provider "nsxt" {
  allow_unverified_ssl = true
  max_retries = 10
  retry_min_delay = 500
  retry_max_delay = 5000
  retry_on_status_codes = [429]
}
*/

data "nsxt_transport_zone" "overlay_transport_zone" {
  display_name = "tz-overlay"
}

data "nsxt_logical_tier0_router" "tier0_router" {
  display_name = "t0"
}

data "nsxt_edge_cluster" "edge_cluster" {
  display_name = "edge-cluster"
}

resource "nsxt_logical_router_link_port_on_tier0" "tier0_port_to_tier1" {
  description = "TIER0_PORT1 provisioned by Terraform"
  display_name = "tier0_port_to_tier1"
  logical_router_id = "${data.nsxt_logical_tier0_router.tier0_router.id}"
  tag {
    scope = "ibm"
    tag   = "blue"
  }
}

resource "nsxt_logical_tier1_router" "tier1_router" {
  description = "RTR1 provisioned by Terraform"
  display_name = "${var.nsxt_logical_tier1_router_name}"
  #failover_mode = "PREEMPTIVE"
  edge_cluster_id = "${data.nsxt_edge_cluster.edge_cluster.id}"
  enable_router_advertisement = true
  advertise_connected_routes = false
  advertise_static_routes = true
  advertise_nat_routes = true
  tag {
    scope = "ibm"
    tag   = "blue"
  }
}

resource "nsxt_logical_router_link_port_on_tier1" "tier1_port_to_tier0" {
  description  = "TIER1_PORT1 provisioned by Terraform"
  display_name = "tier1_port_to_tier0"
  logical_router_id = "${nsxt_logical_tier1_router.tier1_router.id}"
  linked_logical_router_port_id = "${nsxt_logical_router_link_port_on_tier0.tier0_port_to_tier1.id}"
  tag {
    scope = "ibm"
    tag   = "blue"
  }
}

resource "nsxt_logical_switch" "LS-terraform-web" {
  admin_state = "UP"
  description = "LogicalSwitch provisioned by Terraform"
  display_name = "${var.logicalswitch1_name}"
  transport_zone_id = "${data.nsxt_transport_zone.overlay_transport_zone.id}"
  replication_mode  = "MTEP"
  tag {
    scope = "ibm"
    tag = "blue"
  }
}

resource "nsxt_logical_switch" "LS-terraform-app" {
  admin_state = "UP"
  description = "LogicalSwitch provisioned by Terraform"
  display_name = "${var.logicalswitch2_name}"
  transport_zone_id = "${data.nsxt_transport_zone.overlay_transport_zone.id}"
  replication_mode  = "MTEP"
  tag {
    scope = "ibm"
    tag = "blue"
  }
}


resource "nsxt_logical_switch" "LS-terraform-db" {
  admin_state = "UP"
  description = "LogicalSwitch provisioned by Terraform"
  display_name = "${var.logicalswitch3_name}"
  transport_zone_id = "${data.nsxt_transport_zone.overlay_transport_zone.id}"
  replication_mode  = "MTEP"
  tag {
    scope = "ibm"
    tag = "blue"
  }
}

resource "nsxt_logical_port" "lp-terraform-web" {
  admin_state = "UP"
  description = "lp provisioned by Terraform"
  display_name = "lp-terraform-web"
  logical_switch_id = "${nsxt_logical_switch.LS-terraform-web.id}"

  tag {
    scope = "ibm"
    tag   = "blue"
  }
}

resource "nsxt_logical_port" "lp-terraform-app" {
  admin_state = "UP"
  description = "lp provisioned by Terraform"
  display_name = "lp-terraform-app"
  logical_switch_id = "${nsxt_logical_switch.LS-terraform-app.id}"

  tag {
    scope = "ibm"
    tag   = "blue"
  }
}

resource "nsxt_logical_port" "lp-terraform-db" {
  admin_state = "UP"
  description = "lp provisioned by Terraform"
  display_name = "lp-terraform-db"
  logical_switch_id = "${nsxt_logical_switch.LS-terraform-db.id}"

  tag {
    scope = "ibm"
    tag   = "blue"
  }
}

resource "nsxt_logical_router_downlink_port" "lif-terraform-web" {
  description = "lif provisioned by Terraform"
  display_name = "lif-terraform-web"
  logical_router_id = "${nsxt_logical_tier1_router.tier1_router.id}"
  linked_logical_switch_port_id = "${nsxt_logical_port.lp-terraform-web.id}"
  ip_address = "${var.logicalswitch1_gw}"

  tag {
    scope = "ibm"
    tag   = "blue"
  }
}

resource "nsxt_logical_router_downlink_port" "lif-terraform-app" {
  description = "lif provisioned by Terraform"
  display_name = "lif-terraform-app"
  logical_router_id = "${nsxt_logical_tier1_router.tier1_router.id}"
  linked_logical_switch_port_id = "${nsxt_logical_port.lp-terraform-app.id}"
  ip_address = "${var.logicalswitch2_gw}"

  tag {
    scope = "ibm"
    tag   = "blue"
  }
}

resource "nsxt_logical_router_downlink_port" "lif-terraform-db" {
  description = "lif provisioned by Terraform"
  display_name = "lif-terraform-db"
  logical_router_id = "${nsxt_logical_tier1_router.tier1_router.id}"
  linked_logical_switch_port_id = "${nsxt_logical_port.lp-terraform-db.id}"
  ip_address = "${var.logicalswitch3_gw}"

  tag {
    scope = "ibm"
    tag   = "blue"
  }
}

resource "nsxt_l4_port_set_ns_service" "ns_service_tcp_443_22_l4" {
  description = "Service provisioned by Terraform"
  display_name = "web_to_app"
  protocol = "TCP"
  destination_ports = ["443", "22"]
  tag {
    scope = "ibm"
    tag   = "blue"
  }
}

resource "nsxt_firewall_section" "terraform" {
  description = "FS provisioned by Terraform"
  display_name = "Web-App"
  tag {
    scope = "ibm"
    tag = "blue"
  }
  
  applied_to {
    target_type = "LogicalSwitch"
    target_id = "${nsxt_logical_switch.LS-terraform-web.id}"
  }

  section_type = "LAYER3"
  stateful = true

  rule {
    display_name = "out_rule"
    description  = "Out going rule"
    action = "ALLOW"
    logged = true
    ip_protocol = "IPV4"
    direction = "OUT"

    source {
      target_type = "LogicalSwitch"
      target_id = "${nsxt_logical_switch.LS-terraform-web.id}"
    }

    destination {
      target_type = "LogicalSwitch"
      target_id = "${nsxt_logical_switch.LS-terraform-app.id}"
    }
    service {
      target_type = "NSService"
      target_id = "${nsxt_l4_port_set_ns_service.ns_service_tcp_443_22_l4.id}"
    }
    applied_to {
      target_type = "LogicalSwitch"
      target_id = "${nsxt_logical_switch.LS-terraform-web.id}"
    }
  }
}  

output "edge-cluster-id" {
  value = "${data.nsxt_edge_cluster.edge_cluster.id}"
}

output "edge-cluster-deployment_type" {
  value = "${data.nsxt_edge_cluster.edge_cluster.deployment_type}"
}

output "tier0-router-port-id" {
  value = "${nsxt_logical_router_link_port_on_tier0.tier0_port_to_tier1.id}"
}

Then I use the below to call this newly created module:

provider "nsxt" {
  allow_unverified_ssl = true
  max_retries = 10
  retry_min_delay = 500
  retry_max_delay = 5000
  retry_on_status_codes = [429]
}

module "nsxtbuild" {
  source = "/root/terraform/modules/nsxtbuild"
  nsxt_logical_tier1_router_name = "tier1-npr-vr"
  logicalswitch1_name = "npr-web"
  logicalswitch2_name = "npr-app"
  logicalswitch3_name = "npr-db"
  logicalswitch1_gw = "192.168.80.1/24"
  logicalswitch2_gw = "192.168.81.1/24"
  logicalswitch3_gw = "192.168.82.1/24"
}

After “terraform apply”, you can find the required environment is built successfully in NSX Manager.

Logical Switches
T1 vRouter
Service
DFW Rules

Install Docker Offline on Centos7

Recently, I had to build an environment which have a kind of real web application running to test LBaaS site affinity solution,. After a few minutes,I made a decision to install a Jenkins container on my testing Centos 7 virtual machines. 

Unfortunately, my Centos virtual machines have no Internet access. So I spent a bit of time to work out how to installl docker and run a container offline on Centos 7. Then I have this blog which maybe can help others who have the same challenge.

The docker version which I am going to install is: 
docker-ce-18.03.1.ce-1.el7.centos

On another Linux Centos 7 (minimum install) which have Internet access, I run the CLI below to identify all required packages for Docker offline installation.
repoquery -R docker-ce-18.03.1.ce-1.el7.centos
From the output, I found out that I need the following packages to complete Docker offline installation:

1:libsepol-2.5-8.1.el7
2:libselinux-2.5-12.el7
3:audit-libs-2.8.1-3.el7_5.1
4:libsemanage-2.5-11.el7
5:libselinux-utils-2.5-12.el7
6:policycoreutils-2.5-22.el7
7:selinux-policy-3.13.1-192.el7
8:libcgroup-0.41-15.el7
9:selinux-policy-targeted-3.13.1-19
10:libsemanage-python-2.5-11.el7
11:audit-libs-python-2.8.1-3.el7_5.1
12:setools-libs-3.3.8-2.el7
13:python-IPy-0.75-6.el7
14:pigz-2.3.3-1.el7.centos
15:checkpolicy-2.5-6.el7
16:policycoreutils-python-2.5-22.el7
17:container-selinux-2:2.68-1.el7
18:docker-ce-18.03.1.ce-1.el7.centos
19:audit-2.8.1-3.el7_5.1

Then I went to download docker rpm package and all dependent packages with yumdownloader:
yumdownloader –resolve  docker-ce-18.03.1.ce-1.el7.centos

I archived the above packages (tar cf docker-ce.offline.tar *.rpm) and uploaded to my offline Centos 7 virtual machines. Then use the rpm CLI to install Docker:

[root@lbaas02 ~]# rpm -ivh –replacefiles –replacepkgs *.rpm

warning: audit-2.8.1-3.el7_5.1.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEYwarning: docker-ce-18.03.1.ce-1.el7.centos.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID 621e9f35: NOKEYPreparing…                          ################################# [100%]Updating / installing…   1:libsepol-2.5-8.1.el7             ################################# [  5%]   2:libselinux-2.5-12.el7            ################################# [ 11%]   3:audit-libs-2.8.1-3.el7_5.1       ################################# [ 16%]   4:libsemanage-2.5-11.el7           ################################# [ 21%]   5:libselinux-utils-2.5-12.el7      ################################# [ 26%]   6:policycoreutils-2.5-22.el7       ################################# [ 32%]   7:selinux-policy-3.13.1-192.el7    ################################# [ 37%]   8:libcgroup-0.41-15.el7            ################################# [ 42%]   9:selinux-policy-targeted-3.13.1-19################################# [ 47%]  10:libsemanage-python-2.5-11.el7    ################################# [ 53%]  11:audit-libs-python-2.8.1-3.el7_5.1################################# [ 58%]  12:setools-libs-3.3.8-2.el7         ################################# [ 63%]  13:python-IPy-0.75-6.el7            ################################# [ 68%]  14:pigz-2.3.3-1.el7.centos          ################################# [ 74%]  15:checkpolicy-2.5-6.el7            ################################# [ 79%]  16:policycoreutils-python-2.5-22.el7################################# [ 84%]  17:container-selinux-2:2.68-1.el7   ################################# [ 89%]  18:docker-ce-18.03.1.ce-1.el7.centos################################# [ 95%]  19:audit-2.8.1-3.el7_5.1            ################################# [100%]

After the installation completed,  started and enabled docker service:

[root@lbaas02 ~]# systemctl enable docker

Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.

[root@lbaas02 ~]# systemctl start docker

Now the next question for me is to import the offline Jenkins docker image. Firstly, I pulled the Jenkisn docker image:

docker pull jenkins/jenkins

Then exported the docker image as a file and uploaded to my testing Centos.

docker save -o jenkins.docker jenkins/jenkins

On my testing Centos, I loaded the image to docker process.

[root@lbaas01 ~]# docker load -i jenkins.docker

 f715ed19c28b: Loading layer [==================================================>]  105.5MB/105.5MB 8bb25f9cdc41: Loading layer [==================================================>]  23.99MB/23.99MB 08a01612ffca: Loading layer [==================================================>]  7.994MB/7.994MB 1191b3f5862a: Loading layer [==================================================>]  146.4MB/146.4MB 097524d80f54: Loading layer [==================================================>]  2.332MB/2.332MB 685f72a7cd4f: Loading layer [==================================================>]  3.584kB/3.584kB  9c147c576d67: Loading layer [==================================================>]  1.536kB/1.536kB   e9805f9bdc9e: Loading layer [==================================================>]  356.3MB/356.3MB 8b47d19735d5: Loading layer [==================================================>]  362.5kB/362.5kB e2a15a753d48: Loading layer [==================================================>]  338.9kB/338.9kB 287c6d658570: Loading layer [==================================================>]  3.584kB/3.584kB 5e9d64b80844: Loading layer [==================================================>]  9.728kB/9.728kB   be6e5f898997: Loading layer [==================================================>]  868.9kB/868.9kB  609adfa44126: Loading layer [==================================================>]  4.608kB/4.608kB  a26f92334a9c: Loading layer [==================================================>]  75.92MB/75.92MB de90b90d0715: Loading layer [==================================================>]  4.608kB/4.608kB  13d8fca176c6: Loading layer [==================================================>]  9.216kB/9.216kB   be0781510eef: Loading layer [==================================================>]  4.608kB/4.608kB   d7e644ce9f14: Loading layer [==================================================>]  3.072kB/3.072kB 47dd83bc99e4: Loading layer [==================================================>]  7.168kB/7.168kB  96e3e5ce2959: Loading layer [==================================================>]  12.29kB/12.29kB               Loaded image: jenkins/jenkins:latest

[root@lbaas01 ~]# docker images

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

jenkins/jenkins     latest              51158f0cf7bc        6 days ago          701MB

Now I am able to start my Jenkins docker on this offline Centos 7.

docker run -d -p 8080:8080 -p 50000:50000 -v jenkins_home:/var/jenkins_home jenkins/jenkins

Wait for 2-3 mins. After Jenkins container is fully running, I can login into my Jenkins.:)