It’s not a secret that containers technology (popularly known as dockers) is becoming one of the top choices in software projects [1], but What about data projects/clusters? Many companies and projects have intentions to take advantages of it. Some examples are Cloudera [2] and the apache-spark-on-k8s project [3], personally, I suggest if you want more information as what exactly is called “Big Data as a Service” to check the last Strata Data Conference [4] of Anant Chintamaneni and Nanda Vijaydev (BlueData).
In this article, I will guide you with simple steps in order to get a Cloudera Quickstart Images v5.13 running remotely in a Google Cloud instance. Well, get the job done!
Prerequisites
1. Have a Google Cloud account (Just log in with your Gmail and automatically get $300 of credit for one year) [5]
2. Create a new project
- First, create a VM instance
2. Define basic tech specs (important to allow HTTP y HTTPS traffic)
3. Connect using SSH
4. Install docker
curl -sSL https://get.docker.com/ | sh
5. Update the package database with the Docker package
sudo apt-get update
6. Get the Cloudera Quickstart Image
sudo wget https://downloads.cloudera.com/demo_vm/docker/cloudera-quickstart-v…
7. Extract the tar file
tar xzf cloudera-quickstart-vm-*-docker.tar.gz
8. Import the docker *maybe you could run out of space, in that case, remove the tar.gz file an re-run the import
sudo docker import cloudera-quickstart-vm-5.13.0–0-beta-docker.tar
9. Check the container image ID
sudo docker images
10. Run the container
sudo docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 8777:8888 -p 7190:7180 -p 90:80 b46c7719892d /usr/bin/docker-quickstart
Let’s do some explanation about the parameters [7]
· sudo docker run: main command to start the docker
· — hostname: Pseudo-distributed configuration assumes this hostname
· — privileged=true: Required for HBase, MySQL-backed Hive metastore, Hue, Oozie, Sentry and Cloudera Manager
· -t: Allocate a pseudoterminal. Once services are started, a Bash shell takes over. This switch starts a terminal emulator to run the services.
· -i: If you want to use the terminal, either immediately or connect to the terminal later.
· -p 8777:8888: Map the Hue port in the guest to another port on the host.
· b46c7719892d: Docker images ID obtained from step 9
11. Test the services
Spark
Hive
HBase
Hue (port 8777)**
- *In order to access first you have to allow the ports you defined in step 10. For security try to open just those ports, in the image I opened all.
User and password cloudera
Hue UI!
Cloudera running (port 90)
11. Exit the container
Just type Ctrl+d
Go further
You can run in the background with this code, because if you do not pass the -d flag to docker run your terminal automatically attaches to the container
sudo docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 8777:8888 -p 7190:7180 -p 90:80 b46c7719892d /usr/bin/docker-quickstart -d
If you want to reconnect to the shell (to stop just type Ctrl+d)
sudo docker ps 256e31278a92
sudo docker attach 256e31278a92
Conclusion
In this article, I show how easy is to start using the Cloudera Quickstart Image using dockers.
See you in the next article! Happy Learning!
Code:
Useful information for the article about setting a cloudera quickstart images with docker …github.com
Links:
[1] https://www.theserverside.com/feature/The-benefits-of-container-dev…
[2] http://community.cloudera.com/t5/CDH-Manual-Installation/CDH-on-Kub…
[3] https://github.com/apache-spark-on-k8s/spark
[4] https://conferences.oreilly.com/strata/strata-ny-2018/public/schedu…
[6] https://www.digitalocean.com/community/tutorials/how-to-install-and…
[7] https://www.cloudera.com/documentation/enterprise/5-6-x/topics/quic…