In NSX, by default NSX will save your DFW configuration when you perform a change. Up to 90 Configuration can be saved. This feature is called auto save or auto-draft, which is designed to help the restoration of your DFW configuration.
However this feature can be A BOMB as well when you have a “big” (like over 20k rules, I know it is not big at all for an enterprise client) number firewall rules.
Recently we have a P1 issue for this.
We see two symptoms:
- NSX manager daily back up failed. In NSX syslog we see the below:2016-12-22 18:04:03.566 AEDT ERROR taskScheduler-1 VsmServiceBackupRestoreExecutor:254 – Run backup script – Failure due to NSX Manager database data dump operation failed.vsm-appliance-mgmt:150500:Exception occurred while taking backup.:NSX Manager database data dump operation failed.2016-12-23 18:04:11.835 AEDT ERROR taskScheduler-1 VsmServiceBackupRestoreExecutor:254 – Run backup script – Failure due to NSX Manager database data dump operation failed.vsm-appliance-mgmt:150500:Exception occurred while taking backup.:NSX Manager database data dump operation failed.
- We can’t perform any change around NSX DFW including exclusion list using NSX GUI or API call although we are still able to get the current DFW configuration in GUI and perform GET API.
We worked with VMWare support team and tried to fix the issue. Finally, we identify the issue is due to a “over-sized” (in our case around 13GB, we still have nearly 9GB space in the DB partition) table in NSX manager. The naughty table is firewall_draft_compact_rule
We have to disable the DFW auto save feature then delete the saved configuration to restore our service.
- PUT https://NSX-Manager-IP-Address/api/4.0/firewall/config/globalconfiguration
- Request body:
- <globalConfiguration> <layer3RuleOptimize>…</layer3RuleOptimize> <layer2RuleOptimize>…</layer2RuleOptimize> <tcpStrictOption>…</tcpStrictOption> <autoDraftDisabled>true</autoDraftDisabled> </globalConfiguration>
It is reasonable that the NSX backup failed when you don’t have enough disk space available in NSX manager. However, we are still waiting for the formal explanation why we can’t change the DFW configuration. My current guess is the firewall_draft_compact_rule table are put into “Read-only” mode when the size exceeds some kind of threshhold. Once we get the feedback, will update this post accordingly.
Note: the auto-save/auto-draft feature can only be disabled when your NSX is 6.2.3 and onwards.
Information from VMWare Support:
Backup’s were failing due to one of the DB tables consuming large space (firewall_draft_compact_rule). This could happen Concurrent Firewall Config operation are sometime throwing drafts in a inconsistent state. Once draft lands into an inconsistent state, the cleanup operation does not work as expected. In addition the scheduled compaction task keeps piling up new compacted configurations in the firewall_draft_compact_config, firewall_draft_compact_section, firewall_draft_compact_rule and firewall_draft_config_change table eventually filling up the disk.