Kerberizing a Hadoop Cluster - Configure Active Directory Integration

This post explains kerberizing an existing Hadoop cluster using Ambari. Kerberos helps with the Authentication part of enterprise security (while authorization, auditing and data protection being the remaining parts).

HDP uses Kerberos, which is an industry standard for authenticate users and resources and providing strong identity for users. Apache Ambari can kerberize an existing cluster by using an existing MIT key distribution center (KDC) or Microsoft’s Active Directory.

Configuring Active Directory

For the sake of this post simplicity, lets assume an active directory already exists and can communicate with the HDP cluster. HDP requires secure LDAP connectivity so on the DC, Active Directory Certificate Services must be installed and configured. Below are a series of screenshots explaining this configuration:

Add necessary roles

Choose the AD server
Choose Role
Select certification authority
Go ahead and Install the role.

On the server manager, click this notification and click “configure AD certificate services”
Choose the certification authority
If you are generating your own certificates, the option Enterprise CA must be checked. I will choose a Standalone CA
The CA type should be Root CA
Create a new private key
Use defaults for Cryptography for CA
And then, specify a name for a CA. After choosing a validity period, click configure.

Create Users and Containers for Cluster

Create a container, kerberos admin, and permissions for the cluster

From advanced features, create a container:
Lets call the container HDP.
Similarly, create another container called sandbox

Create Users

Create a user, sandboxadmin and delegate control of the container to the user
Choose delegation to “create, delete and manage user accounts”

Enabling Kerberos on Existing Cluster

Now the action shifts to HDP cluster and Ambari. Go to Admin tab on Ambari console and enable Kerberos.

While toggling kerberos setting, Ambari will warn formatting ResourceManager state. Since Kerberos will be (or should be) done during the initial setup of the cluster, this is fine.

Integrating with Active Directory

Ambari’s security wizard will take you through options to choose a KDC (in this case, we will choose Microsoft Active Directory) a list of prerequisites
Check all of them and go to next page. Under configure Kerberos, provide relevant values for the KDC. Note that AD is our KDC in this scenario and your values may change based on the AD server name etc
Kdadmin host will be the AD host and the admin user will be the user we created in the previous section: When you click “Next” the wizard will install Kerberos clients on all the nodes in the cluster:
Confirm configuration and next
The services will be stopped momentarily before kerberizing the cluster

Thats about it! Once the services are restarted, you should be able to play around with your shiny secure HDP cluster! :)

comments powered by Disqus