It’s not a secret that containers technology (popularly known as dockers) is becoming one of the top choices in software projects [1], but What about data projects/clusters? Many companies and projects have intentions to take advantages of it. Some examples are Cloudera [2] and the apache-spark-on-k8s project [3], personally, I suggest if you want more information as what exactly is called “Big Data as a Service” to check the last Strata Data Conference [4] of Anant Chintamaneni and Nanda Vijaydev (BlueData).
In this article, I will guide you with simple steps in order to get a Cloudera Quickstart Images v5.13 running remotely in a Google Cloud instance. Well, get the job done!
Prerequisites
1. Have a Google Cloud account (Just log in with your Gmail and automatically get $300 of credit for one year) [5]
2. Create a new project
- First, create a VM instance
2. Define basic tech specs (important to allow HTTP y HTTPS traffic)
data:image/s3,"s3://crabby-images/19b49/19b49839bec4834cc5fab0771d3a0ca7a18d9b46" alt="1Kk7wfnND9F96rAhmyA8PKg"
3. Connect using SSH
data:image/s3,"s3://crabby-images/163b2/163b2f42d84c6af96d2bccc0de907f47708d2edb" alt="1fpLsJD21q5sNoD5GwjnUqw"
4. Install docker
curl -sSL https://get.docker.com/ | sh
5. Update the package database with the Docker package
sudo apt-get update
data:image/s3,"s3://crabby-images/8a2e3/8a2e3d6d1f57a1f981fc3104a6aa518af2e0ae5f" alt="1tUzhj5whR95LMIAstFlF8Q"
6. Get the Cloudera Quickstart Image
sudo wget https://downloads.cloudera.com/demo_vm/docker/cloudera-quickstart-v…
data:image/s3,"s3://crabby-images/2cc90/2cc9075f8275f73028cc1911953bcbc4e6bfc2d9" alt="1cvc8DQQpCeuAmNYduC2ePA"
7. Extract the tar file
tar xzf cloudera-quickstart-vm-*-docker.tar.gz
8. Import the docker *maybe you could run out of space, in that case, remove the tar.gz file an re-run the import
sudo docker import cloudera-quickstart-vm-5.13.0–0-beta-docker.tar
data:image/s3,"s3://crabby-images/7d558/7d558b58cdea373291d0378526ae3a1def1576a9" alt="1JCecmqe9-bnYjqw3Gg7jpw"
9. Check the container image ID
sudo docker images
10. Run the container
sudo docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 8777:8888 -p 7190:7180 -p 90:80 b46c7719892d /usr/bin/docker-quickstart
Let’s do some explanation about the parameters [7]
· sudo docker run: main command to start the docker
· — hostname: Pseudo-distributed configuration assumes this hostname
· — privileged=true: Required for HBase, MySQL-backed Hive metastore, Hue, Oozie, Sentry and Cloudera Manager
· -t: Allocate a pseudoterminal. Once services are started, a Bash shell takes over. This switch starts a terminal emulator to run the services.
· -i: If you want to use the terminal, either immediately or connect to the terminal later.
· -p 8777:8888: Map the Hue port in the guest to another port on the host.
· b46c7719892d: Docker images ID obtained from step 9
11. Test the services
Spark
data:image/s3,"s3://crabby-images/c5869/c586921571306a9fba0a3a653b3ff8aa4c5c065d" alt="1Q5391yvjrnfprN3kFRbFVw"
Hive
HBase
data:image/s3,"s3://crabby-images/515bb/515bb7c25eb2deae60c4f38c8783970a6092eca3" alt="1EJ2pJDcvoFEJnBMUEzwonQ"
Hue (port 8777)**
data:image/s3,"s3://crabby-images/ebb8f/ebb8f82089c9b921ffd02f03a6a1d3abd4bb67ce" alt="1b3OCSUYeOLnflqJgLYgPkA"
- *In order to access first you have to allow the ports you defined in step 10. For security try to open just those ports, in the image I opened all.
User and password cloudera
data:image/s3,"s3://crabby-images/18292/182920462306a92fffa9b28dd4b883bd57afe619" alt="1sSf66qy25fX2OThwkPq6Tg"
Hue UI!
data:image/s3,"s3://crabby-images/c67c7/c67c7427173b27561438b5be9a6f92e9dec332cc" alt="1j_GKuirMWRZa8BrEKWIK5Q"
Cloudera running (port 90)
data:image/s3,"s3://crabby-images/5b287/5b2872def2a8a76e1fef240c95d89a02a0f2111d" alt="1l2akFsxjwitLGoO_oVJFlg"
11. Exit the container
Just type Ctrl+d
Go further
You can run in the background with this code, because if you do not pass the -d flag to docker run your terminal automatically attaches to the container
sudo docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 8777:8888 -p 7190:7180 -p 90:80 b46c7719892d /usr/bin/docker-quickstart -d
If you want to reconnect to the shell (to stop just type Ctrl+d)
sudo docker ps 256e31278a92
sudo docker attach 256e31278a92
data:image/s3,"s3://crabby-images/fb629/fb6297c63318d0dc29404b76af4a2a44a66a2ffb" alt="1Klt3R_AGr2UQQHmYPqE9cg"
Conclusion
In this article, I show how easy is to start using the Cloudera Quickstart Image using dockers.
See you in the next article! Happy Learning!
Code:
Useful information for the article about setting a cloudera quickstart images with docker …github.com
Links:
[1] https://www.theserverside.com/feature/The-benefits-of-container-dev…
[2] http://community.cloudera.com/t5/CDH-Manual-Installation/CDH-on-Kub…
[3] https://github.com/apache-spark-on-k8s/spark
[4] https://conferences.oreilly.com/strata/strata-ny-2018/public/schedu…
[6] https://www.digitalocean.com/community/tutorials/how-to-install-and…
[7] https://www.cloudera.com/documentation/enterprise/5-6-x/topics/quic…