Originally posted on Analytic Bridge
The big data blast has given rise to a host of information technology software and tools and abilities that enable companies to manage, capture, and analyze large data sets of unstructured and structure data for result oriented insights and competitive success. But with this latest technology comes the challenge of keeping confidential information secure and private.
Big data that resides within a Hadoop environment contains sensitive confidential data such as bank account details financial information in the form credit card, corporate business, property information, personal confidential information, security information of clients and all.
Due to the confidential nature of all of data and the losses that can be done should it fall into the wrong hands, it is mandatory that it be protected from unauthorized access.
look at some general Hadoop security issues along with best practices to keep sensitive data protected and secure.
Security concerns with Hadoop
It wasn’t all that long ago that Hadoop in the enterprise was primarily deployed on-premise. As such, informative confidential data was safely confined in isolated clusters or data silos where security wasn’t a problem. But that fastly changed as Hadoop developed into Big Data as-a-Service (BDaaS), took to the cloud, and became surrounded by an ever-growing ecosystem of softwares and applications. And while these innovations have served to democratize data and bring Hadoop into the mainstream, they have also created new security concerns for organizations that now struggle to scale security in step with Hadoop’s rapid technological advances.
For many companies Hadoop has developed into an enterprise data platform. That poses new security challenges as data that was once siloed is brought together in a vast data lake and made accessible to a variety of users across the organization. Among these challenges are:
-
Ensuring the proper authentication of users who access Hadoop.
-
Ensuring that authorized Hadoop users can only access the data that they are entitled to access.
-
Ensuring that data access histories for all users are recorded in accordance with compliance regulations and for other important purposes.
-
Ensuring the protection of data—both at rest and in transit—through enterprise-grade encryption.
Hadoop security best practices
Clearly, today’s compnies face formidable security challenges. And the stakes regarding data security are being raised ever higher as sensitive healthcare data, personal retail customer data, smart phone data, and social media and sentiment data become more and more a part of the big data mix. It’s time for companies to reevaluate the protection and safety of their data in Hadoop and to reacquaint themselves with the below Hadoop security good practices.
1. Plan before you deploy – Big data protection strategies must be determined during the planning phase of the Hadoop deployment. Before moving any data into Hadoop it’s important to identify any confidential data elements, along with where those elements will reside in the hadoop system. In addition, all company privacy policies and pertinent industry and governmental regulations must be taken into consideration during the planning phase in order to better identify and reduce compliance exposure risk.
2. Don’t overlook basic security measures– Basic security measures can go a long way in meeting Hadoop security challenges. To ensure user identification and control user access to sensitive data it’s important to create users and groups and then map users to groups. Permissions should be assigned and locked down by groups, and the use of strong passwords should be strictly enforced. Fine grained permissions should be assigned on a need-to-know basis only and broad stroke permissions should be avoided as much as possible.
3. Choose the right remediation technique – When big data analytics needs require access to real data, as opposed to data that has been desensitized, there are two remediation techniques to choose from—encryption or masking. While masking offers the most secure remediation, encryption might be a better choice as it offers greater flexibility to meet evolving needs. Either way it’s important to ensure that the data protection solutions being considered are capable of supporting both remediation techniques. That way, both masked and unmasked versions of sensitive data can be kept in separate Hadoop directories if desired.
4. Ensure that encryption integrates with access control– Once an encryption solution is chosen it must be made compatible with the organization’s access control technology. Otherwise, users with different credentials won’t have the appropriate, selective access to sensitive data in the Hadoop environment that they require.
5. Monitor, detect and resolve issues– Even the best security models will be found wanting without the capability to detect non-compliance issues and suspected or actual security breaches and quickly resolve them. Organizations need to make sure that best practice monitoring, and detection processes are in place.
6. Ensure proper training and enforcement– To be fully effective, best practice policies and procedures with respect to data security in Hadoop must be frequently revisited in employee training and constantly supervised and enforced.
Hadoop is enabling organizations to analyze vast and rich data stores and derive actionable insights that inform new and better products and services and help to create competitive advantage. But the benefits of Hadoop come with risks. Hopefully the above information will help organizations to gain a better understanding of the security and compliance issues associated with Hadoop and to implement best practices to keep sensitive data safe and secure going forward.