One morning last week I received an incident telling me the SharePoint 2016 index partition was not functioning correctly. In the Central Administration both Index partitions where showing a yellow warning icon
Because it was a holiday for the customer, but not for me, I was able to do a restart of the server. After this reboot the issue corrected itself on one server but not on the other server. As the Index Partition was setup high available, we removed one server from the topology and started troubleshooting. The first guess was to just redo the configuration and re-adding the server. When connecting the server I received an error the SharePoint server could not be contacted. After checking and doublechecking the required firewall ports were open, the server was online and the Search Service really was started on the server it was time to dig into the ULS log.
After some digging in the log files we found the hostcontroller.exe process trying to connect to the search service on all servers, but not succeeding in contacting one server. The error message stated was
Failed to connect to Host controller in Server : <servername>. Exception : System.ServiceModel.Security.SecurityNegotiationException: A call to SSPI failed, see inner exception. —> System.Security.Authentication.AuthenticationException: A call to SSPI failed, see inner exception. —> System.ComponentModel.Win32Exception: The encryption type requested is not supported by the KDC
A possible cause for this error may be the disabling of certain encryption protocols in scope of a host hardening project. A first hint was to check the enabled encryption protocols in the security policy :
“Network Security: Configure encryption types allowed for Kerberos” to “Enabled” with only the following selected:
Future encryption types
After a reboot of the working server the issue popped back up on the working server, but was fixed on the non-working server. Another reboot fixed it for both. Another reboot reversed the above scenario. This lead us to believe the issue was somewhere else then SharePoint. Next suspect was the Active Directory. The hosts which were not working were all connecting to the same domain controller. After a call with the project team it became clear a Domain Function Level upgrade was performed from Windows 2003 to Windows 2008R2 last week. After the DFL/FFL upgrade the servers where not rebooted. On the DC we saw some messages which might have been related to the problem. Because the impact was minimal, a reboot was executed on the environment and this fixed the problem.
- Reboot the DC (or the KDC service on the DC) after upgrading the DFL/FFL
- Check all applications in a test environment before doing the upgrade in production
- Beware of an DFL/FFL upgrade from Windows 2003