Hadoop
- Follow the instructions to download, install and run hadoop for Hadoop Single Cluster
up to the standalone operation mode.
2. Setup Cluster of Docker Containers
- Plan the number of nodes in your cluster and the associated networking
architecture (see
Creating
Bridges
and/or
Docker container
networking)
- Create/run the docker containers for the cluster (see
Primer below)
- Check connectivity of each node
- Create user accounts on each node
- Enable passwordless ssh on each node
3. Setup Hadoop on Cluster of Docker Containers
- Test Hadoop Standalone on one docker container (make sure
all dependencies are met)
- Follow the instructions for Cluster
Setup
- Copy some input data into hdfs
- Run a hadoop job (use the examples)
Docker Primer
- docker images = list all images
- docker image {ls, rm, build}
- docker rmi = remove image
- docker run imagename
- docker run -it imagename
- docker run -it –name=containername –hostname=myhostname image
- docker run -p 127.0.0.1:80:8080 ubuntu bash = map
port 8080 of the container to 127.0.0.1:80 of the host
- docker ps -a = list all containers including stopped ones
- docker rm imagename = remove container
- docker start imagename = start a stopped container
- docker build -t imagetag imagedir
- docker network ls
- docker network inspect networkname
- docker network create -d bridge my_bridge
- docker run -d –net=my_bridge –name db training/postgres
Linux System Admin Primer
- ls, rm, mkdir, rmdir, cat, vi, sudo, su
- /etc = config files dir
- /var/log = log files
- /bin, /sbin, /usr/bin, /usr/local
- /home = user directories
- shell/env variables, echo, export, bashrc, profile, source
cmd
- tar, gzip, zip
- adduser, useradd
- chown, chmod, ls -l
- apt-get update, apt-get install, rpm
- ssh, ssh-keygen -t rsa, .ssh/, authorized_keys
- ip a, /etc/hosts, ping