Also trim the hadoop example since we aren't maintaining this live anymore. See galaxy.ansible.com for roles content.

pull/63/head
Michael DeHaan 10 years ago
parent e564ed51b5
commit 4663d2e789
  1. 4
      hadoop/LICENSE.md
  2. 363
      hadoop/README.md
  3. 56
      hadoop/group_vars/all
  4. 31
      hadoop/hosts
  5. BIN
      hadoop/images/hadoop.png
  6. BIN
      hadoop/images/hadoopha.png
  7. BIN
      hadoop/images/hdfs.png
  8. BIN
      hadoop/images/map.png
  9. BIN
      hadoop/images/qjm.png
  10. BIN
      hadoop/images/reduce.png
  11. BIN
      hadoop/images/zookeeper.png
  12. 19
      hadoop/playbooks/inputfile
  13. 21
      hadoop/playbooks/job.yml
  14. 5
      hadoop/roles/common/files/etc/cloudera-CDH4.repo
  15. 2
      hadoop/roles/common/handlers/main.yml
  16. 28
      hadoop/roles/common/tasks/common.yml
  17. 5
      hadoop/roles/common/tasks/main.yml
  18. 5
      hadoop/roles/common/templates/etc/hosts.j2
  19. 25
      hadoop/roles/common/templates/hadoop_conf/core-site.xml.j2
  20. 75
      hadoop/roles/common/templates/hadoop_conf/hadoop-metrics.properties.j2
  21. 44
      hadoop/roles/common/templates/hadoop_conf/hadoop-metrics2.properties.j2
  22. 57
      hadoop/roles/common/templates/hadoop_conf/hdfs-site.xml.j2
  23. 219
      hadoop/roles/common/templates/hadoop_conf/log4j.properties.j2
  24. 22
      hadoop/roles/common/templates/hadoop_conf/mapred-site.xml.j2
  25. 3
      hadoop/roles/common/templates/hadoop_conf/slaves.j2
  26. 80
      hadoop/roles/common/templates/hadoop_conf/ssl-client.xml.example.j2
  27. 77
      hadoop/roles/common/templates/hadoop_conf/ssl-server.xml.example.j2
  28. 25
      hadoop/roles/common/templates/hadoop_ha_conf/core-site.xml.j2
  29. 75
      hadoop/roles/common/templates/hadoop_ha_conf/hadoop-metrics.properties.j2
  30. 44
      hadoop/roles/common/templates/hadoop_ha_conf/hadoop-metrics2.properties.j2
  31. 103
      hadoop/roles/common/templates/hadoop_ha_conf/hdfs-site.xml.j2
  32. 219
      hadoop/roles/common/templates/hadoop_ha_conf/log4j.properties.j2
  33. 120
      hadoop/roles/common/templates/hadoop_ha_conf/mapred-site.xml.j2
  34. 3
      hadoop/roles/common/templates/hadoop_ha_conf/slaves.j2
  35. 80
      hadoop/roles/common/templates/hadoop_ha_conf/ssl-client.xml.example.j2
  36. 77
      hadoop/roles/common/templates/hadoop_ha_conf/ssl-server.xml.example.j2
  37. 40
      hadoop/roles/common/templates/iptables.j2
  38. 14
      hadoop/roles/hadoop_primary/handlers/main.yml
  39. 38
      hadoop/roles/hadoop_primary/tasks/hadoop_master.yml
  40. 38
      hadoop/roles/hadoop_primary/tasks/hadoop_master_no_ha.yml
  41. 9
      hadoop/roles/hadoop_primary/tasks/main.yml
  42. 14
      hadoop/roles/hadoop_secondary/handlers/main.yml
  43. 64
      hadoop/roles/hadoop_secondary/tasks/main.yml
  44. 8
      hadoop/roles/hadoop_slaves/handlers/main.yml
  45. 4
      hadoop/roles/hadoop_slaves/tasks/main.yml
  46. 53
      hadoop/roles/hadoop_slaves/tasks/slaves.yml
  47. 5
      hadoop/roles/qjournal_servers/handlers/main.yml
  48. 22
      hadoop/roles/qjournal_servers/tasks/main.yml
  49. 5
      hadoop/roles/zookeeper_servers/handlers/main.yml
  50. 13
      hadoop/roles/zookeeper_servers/tasks/main.yml
  51. 9
      hadoop/roles/zookeeper_servers/templates/zoo.cfg.j2
  52. 28
      hadoop/site.yml

@ -1,4 +0,0 @@
Copyright (C) 2013 AnsibleWorks, Inc.
This work is licensed under the Creative Commons Attribution 3.0 Unported License.
To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/deed.en_US.

@ -1,363 +0,0 @@
# Deploying Hadoop Clusters using Ansible
## Preface
The playbooks in this example are designed to deploy a Hadoop cluster on a
CentOS 6 or RHEL 6 environment using Ansible. The playbooks can:
1) Deploy a fully functional Hadoop cluster with HA and automatic failover.
2) Deploy a fully functional Hadoop cluster with no HA.
3) Deploy additional nodes to scale the cluster
These playbooks require Ansible 1.2, CentOS 6 or RHEL 6 target machines, and install
the open-source Cloudera Hadoop Distribution (CDH) version 4.
## Hadoop Components
Hadoop is framework that allows processing of large datasets across large
clusters. The two main components that make up a Hadoop cluster are the HDFS
Filesystem and the MapReduce framework. Briefly, the HDFS filesystem is responsible
for storing data across the cluster nodes on its local disks. The MapReduce
jobs are the tasks that would run on these nodes to get a meaningful result
using the data stored on the HDFS filesystem.
Let's have a closer look at each of these components.
## HDFS
![Alt text](images/hdfs.png "HDFS")
The above diagram illustrates an HDFS filesystem. The cluster consists of three
DataNodes which are responsible for storing/replicating data, while the NameNode
is a process which is responsible for storing the metadata for the entire
filesystem. As the example illustrates above, when a client wants to write a
file to the HDFS cluster it first contacts the NameNode and lets it know that
it want to write a file. The NameNode then decides where and how the file
should be saved and notifies the client about its decision.
In the given example "File1" has a size of 128MB and the block size of the HDFS
filesystem is 64 MB. Hence, the NameNode instructs the client to break down the
file into two different blocks and write the first block to DataNode 1 and the
second block to DataNode 2. Upon receiving the notification from the NameNode,
the client contacts DataNode 1 and DataNode 2 directly to write the data.
Once the data is recieved by the DataNodes, they replicate the block across the
other nodes. The number of nodes across which the data would be replicated is
based on the HDFS configuration, the default value being 3. Meanwhile the
NameNode updates its metadata with the entry of the new file "File1" and the
locations where the parts are stored.
## MapReduce
MapReduce is a Java application that utilizes the data stored in the
HDFS filesystem to get some useful and meaningful result. The whole job is
split into two parts: the "Map" job and the "Reduce" Job.
Let's consider an example. In the previous step we had uploaded the "File1"
into the HDFS filesystem and the file was broken down into two different
blocks. Let's assume that the first block had the data "black sheep" in it and
the second block has the data "white sheep" in it. Now let's assume a client
wants to get count of all the words occurring in "File1". In order to get the
count, the first thing the client would have to do is write a "Map" program
then a "Reduce" program.
Here's a psudeo code of how the Map and Reduce jobs might look:
mapper (File1, file-contents):
for each word in file-contents:
emit (word, 1)
reducer (word, values):
sum = 0
for each value in values:
sum = sum + value
emit (word, sum)
The work of the Map job is to go through all the words in the file and emit
a key/value pair. In this case the key is the word itself and value is always
1.
The Reduce job is quite simple: it increments the value of sum by 1, for each
value it gets.
Once the Map and Reduce jobs are ready, the client would instruct the
"JobTracker" (the process resposible for scheduling the jobs on the cluster)
to run the MapReduce job on "File1"
Let's have closer look at the anotomy of a Map job.
![Alt text](images/map.png "Map job")
As the figure above shows, when the client instructs the JobTracker to run a
job on File1, the JobTracker first contacts the NameNode to determine where the
blocks of the File1 are. Then the JobTracker sends the Map job's JAR file down
to the nodes having the blocks, and the TaskTracker process those nodes to run
the application.
In the above example, DataNode 1 and DataNode 2 havw the blocks, so the
TaskTrackers on those nodes run the Map jobs. Once the jobs are completed the
two nodes would have key/value results as below:
MapJob Results:
TaskTracker1:
"Black: 1"
"Sheep: 1"
TaskTracker2:
"White: 1"
"Sheep: 1"
Once the Map phase is completed the JobTracker process initiates the Shuffle
and Reduce process.
Let's have closer look at the Shuffle-Reduce job.
![Alt text](images/reduce.png "Reduce job")
As the figure above demonstrates, the first thing that the JobTracker does is
spawn a Reducer job on the DataNode/Tasktracker nodes for each "key" in the job
result. In this case we have three keys: "black, white, sheep" in our result,
so three Reducers are spawned: one for each key. The Map jobs shuffle the keys
out to the respective Reduce jobs. Then the Reduce job code runs and the sum is
calculated, and the result is written into the HDFS filesystem in a common
directory. In the above example the output directory is specified as
"/home/ben/output" so all the Reducers will write their results into this
directory under different filenames; the file names being "part-00xx", where x
is the Reducer/partition number.
## Hadoop Deployment
![Alt text](images/hadoop.png "Reduce job")
The above diagram depicts a typical Hadoop deployment. The NameNode and
JobTracker usually reside on the same machine, though they can run on seperate
machines. The DataNodes and TaskTrackers run on the same node. The size of the
cluster can be scaled to thousands of nodes with petabytes of storage.
The above deployment model provides redundancy for data as the HDFS filesytem
takes care of the data replication. The only single point of failure are the
NameNode and the JobTracker. If any of these components fail the cluster will
not be usable.
## Making Hadoop HA
To make the Hadoop cluster highly available we would have to add another set of
JobTracker/NameNodes, and make sure that the data updated by the master is also
somehow also updated by the client. In case of failure of the primary node, the
secondary node takes over that role.
The first thing that has to be dealt with is the data held by the NameNode. As
we recall, the NameNode holds all of the metadata about the filesystem, so any
update to the metadata should also be reflected on the secondary NameNode's
metadata copy. The synchronization of the primary and seconary NameNode
metadata is handled by the Quorum Journal Manager.
### Quorum Journal Manager
![Alt text](images/qjm.png "QJM")
As the figure above shows the Quorum Journal manager consists of the journal
manager client and journal manager nodes. The journal manager clients reside
on the same node as the NameNodes, and in case of primary node, collects all the
edits logs happening on the NameNode and sends it out to the Journal nodes. The
journal manager client residing on the secondary namenode regurlary contacts
the journal nodes and updates its local metadata to be consistant with the
master node. In case of primary node failure the secondary NameNode updates
itself to the latest edit logs and takes over as the primary NameNode.
### Zookeeper
Apart from data consistency, a distributed cluster system also needs a
mechanism for centralized coordination. For example, there should be a way for
the secondary node to tell if the primary node is running properly, and if not
it has to take up the role of the primary. Zookeeper provides Hadoop with a
mechanism to coordinate in this way.
![Alt text](images/zookeeper.png "Zookeeper")
As the figure above shows, the Zookeeper services are client/server baseds
service. The server component itself is replicated over a set of machines that
comprise the service. In short, high availability is built into the Zookeeper
servers.
For Hadoop, two Zookeeper clients have been built: ZKFC (Zookeeper Failover
Controller), one for the NameNode and one for JobTracker. These clients run on
the same machines as the NameNode/JobTrackers themselves.
When a ZKFC client is started, it establishes a connection with one of the
Zookeeper nodes and obtains a session ID. The client then keeps a health check
on the NameNode/JobTracker and sends heartbeats to the ZooKeeper.
If the ZKFC client detects a failure of the NameNode/JobTracker, it removes
itself from the ZooKeeper active/standby election, and the other ZKFC client
fences the node/service and takes over the primary role.
## Hadoop HA Deployment
![Alt text](images/hadoopha.png "Hadoop_HA")
The above diagram depicts a fully HA Hadoop Cluster with no single point of
failure and automated failover.
## Deploying Hadoop Clusters with Ansible
Setting up a Hadoop cluster without HA itself can be a challenging and
time-consuming task, and with HA, things become even more difficult.
Ansible can automate the whole process of deploying a Hadoop cluster with or
without HA with the same playbook, in a matter of minutes. This can be used for
quick environment rebuild, or in case of disaster or node failures, recovery
time can be greatly reduced with Ansible automation.
Let's have a look to see how this is done.
## Deploying a Hadoop cluster with HA
### Prerequisites
These playbooks have been tested using Ansible v1.2, and CentOS 6.x (64 bit)
Modify group_vars/all to choose the network interface for Hadoop communication.
Optionally you change the Hadoop-specific parameters like ports or directories
by editing group_vars/all file.
Before launching the deployment playbook make sure the inventory file (hosts)
is set up properly. Here's a sample:
[hadoop_master_primary]
zhadoop1
[hadoop_master_secondary]
zhadoop2
[hadoop_masters:children]
hadoop_master_primary
hadoop_master_secondary
[hadoop_slaves]
hadoop1
hadoop2
hadoop3
[qjournal_servers]
zhadoop1
zhadoop2
zhadoop3
[zookeeper_servers]
zhadoop1 zoo_id=1
zhadoop2 zoo_id=2
zhadoop3 zoo_id=3
Once the inventory is set up, the Hadoop cluster can be setup using the following
command
ansible-playbook -i hosts site.yml
Once deployed, we can check the cluster sanity in different ways. To check the
status of the HDFS filesystem and a report on all the DataNodes, log in as the
'hdfs' user on any Hadoop master server, and issue the following command to get
the report:
hadoop dfsadmin -report
To check the sanity of HA, first log in as the 'hdfs' user on any Hadoop master
server and get the current active/standby NameNode servers this way:
-bash-4.1$ hdfs haadmin -getServiceState zhadoop1
active
-bash-4.1$ hdfs haadmin -getServiceState zhadoop2
standby
To get the state of the JobTracker process login as the 'mapred' user on any
Hadoop master server and issue the following command:
-bash-4.1$ hadoop mrhaadmin -getServiceState hadoop1
standby
-bash-4.1$ hadoop mrhaadmin -getServiceState hadoop2
active
Once you have determined which server is active and which is standby, you can
kill the NameNode/JobTracker process on the server listed as active and issue
the same commands again, and you should see that the standby has been promoted
to the active state. Later, you can restart the killed process and see that node
listed as standby.
### Running a MapReduce Job
To deploy the mapreduce job run the following script from any of the hadoop
master nodes as user 'hdfs'. The job would count the number of occurance of the
word 'hello' in the given inputfile. Eg: su - hdfs -c "/tmp/job.sh"
#!/bin/bash
cat > /tmp/inputfile << EOF
hello
sf
sdf
hello
sdf
sdf
EOF
hadoop fs -put /tmp/inputfile /inputfile
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar grep /inputfile /outputfile 'hello'
hadoop fs -get /outputfile /tmp/outputfile/
To verify the result, read the file on the server located at
/tmp/outputfile/part-00000, which should give you the count.
## Scale the Cluster
When the Hadoop cluster reaches its maximum capacity, it can be scaled by
adding nodes. This can be easily accomplished by adding the node hostname to
the Ansible inventory under the hadoop_slaves group, and running the following
command:
ansible-playbook -i hosts site.yml --tags=slaves
## Deploy a non-HA Hadoop Cluster
The following diagram illustrates a standalone Hadoop cluster.
To deploy this cluster fill in the inventory file as follows:
[hadoop_all:children]
hadoop_masters
hadoop_slaves
[hadoop_master_primary]
zhadoop1
[hadoop_master_secondary]
[hadoop_masters:children]
hadoop_master_primary
hadoop_master_secondary
[hadoop_slaves]
hadoop1
hadoop2
hadoop3
Edit the group_vars/all file to disable HA:
ha_enabled: False
And run the following command:
ansible-playbook -i hosts site.yml
The validity of the cluster can be checked by running the same MapReduce job
that has documented above for an HA Hadoop cluster.

@ -1,56 +0,0 @@
# Defaults to the first ethernet interface. Change this to:
#
# iface: eth1
#
# ...to override.
#
iface: '{{ ansible_default_ipv4.interface }}'
ha_enabled: False
hadoop:
#Variables for <core-site_xml> - common
fs_default_FS_port: 8020
nameservice_id: mycluster4
#Variables for <hdfs-site_xml>
dfs_permissions_superusergroup: hdfs
dfs_namenode_name_dir:
- /namedir1/
- /namedir2/
dfs_replication: 3
dfs_namenode_handler_count: 50
dfs_blocksize: 67108864
dfs_datanode_data_dir:
- /datadir1/
- /datadir2/
dfs_datanode_address_port: 50010
dfs_datanode_http_address_port: 50075
dfs_datanode_ipc_address_port: 50020
dfs_namenode_http_address_port: 50070
dfs_ha_zkfc_port: 8019
qjournal_port: 8485
qjournal_http_port: 8480
dfs_journalnode_edits_dir: /journaldir/
zookeeper_clientport: 2181
zookeeper_leader_port: 2888
zookeeper_election_port: 3888
#Variables for <mapred-site_xml> - common
mapred_job_tracker_ha_servicename: myjt4
mapred_job_tracker_http_address_port: 50030
mapred_task_tracker_http_address_port: 50060
mapred_job_tracker_port: 8021
mapred_ha_jobtracker_rpc-address_port: 8023
mapred_ha_zkfc_port: 8018
mapred_job_tracker_persist_jobstatus_dir: /jobdir/
mapred_local_dir:
- /mapred1/
- /mapred2/

@ -1,31 +0,0 @@
[hadoop_all:children]
hadoop_masters
hadoop_slaves
qjournal_servers
zookeeper_servers
[hadoop_master_primary]
hadoop1
[hadoop_master_secondary]
hadoop2
[hadoop_masters:children]
hadoop_master_primary
hadoop_master_secondary
[hadoop_slaves]
hadoop1
hadoop2
hadoop3
[qjournal_servers]
hadoop1
hadoop2
hadoop3
[zookeeper_servers]
hadoop1 zoo_id=1
hadoop2 zoo_id=2
hadoop3 zoo_id=3

Binary file not shown.

Before

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 148 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 148 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 106 KiB

@ -1,19 +0,0 @@
asdf
sdf
sdf
sd
f
sf
sdf
sd
fsd
hello
asf
sf
sd
fsd
f
sdf
sd
hello

@ -1,21 +0,0 @@
---
# Launch Job to count occurance of a word.
- hosts: $server
user: root
tasks:
- name: copy the file
copy: src=inputfile dest=/tmp/inputfile
- name: upload the file
shell: su - hdfs -c "hadoop fs -put /tmp/inputfile /inputfile"
- name: Run the MapReduce job to count the occurance of word hello
shell: su - hdfs -c "hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar grep /inputfile /outputfile 'hello'"
- name: Fetch the outputfile to local tmp dir
shell: su - hdfs -c "hadoop fs -get /outputfile /tmp/outputfile"
- name: Get the outputfile to ansible server
fetch: dest=/tmp src=/tmp/outputfile/part-00000

@ -1,5 +0,0 @@
[cloudera-cdh4]
name=Cloudera's Distribution for Hadoop, Version 4
baseurl=http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/4/
gpgkey = http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
gpgcheck = 1

@ -1,2 +0,0 @@
- name: restart iptables
service: name=iptables state=restarted

@ -1,28 +0,0 @@
---
# The playbook for common tasks
- name: Deploy the Cloudera Repository
copy: src=etc/cloudera-CDH4.repo dest=/etc/yum.repos.d/cloudera-CDH4.repo
- name: Install the libselinux-python package
yum: name=libselinux-python state=installed
- name: Install the openjdk package
yum: name=java-1.6.0-openjdk state=installed
- name: Create a directory for java
file: state=directory path=/usr/java/
- name: Create a link for java
file: src=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre state=link path=/usr/java/default
- name: Create the hosts file for all machines
template: src=etc/hosts.j2 dest=/etc/hosts
- name: Disable SELinux in conf file
selinux: state=disabled
- name: Create the iptables file for all machines
template: src=iptables.j2 dest=/etc/sysconfig/iptables
notify: restart iptables

@ -1,5 +0,0 @@
---
# The playbook for common tasks
- include: common.yml tags=slaves

@ -1,5 +0,0 @@
127.0.0.1 localhost
{% for host in groups.all %}
{{ hostvars[host]['ansible_' + iface].ipv4.address }} {{ host }}
{% endfor %}

@ -1,25 +0,0 @@
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://{{ hostvars[groups['hadoop_masters'][0]]['ansible_hostname'] + ':' ~ hadoop['fs_default_FS_port'] }}/</value>
</property>
</configuration>

@ -1,75 +0,0 @@
# Configuration of the "dfs" context for null
dfs.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "dfs" context for file
#dfs.class=org.apache.hadoop.metrics.file.FileContext
#dfs.period=10
#dfs.fileName=/tmp/dfsmetrics.log
# Configuration of the "dfs" context for ganglia
# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# dfs.period=10
# dfs.servers=localhost:8649
# Configuration of the "mapred" context for null
mapred.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "mapred" context for file
#mapred.class=org.apache.hadoop.metrics.file.FileContext
#mapred.period=10
#mapred.fileName=/tmp/mrmetrics.log
# Configuration of the "mapred" context for ganglia
# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# mapred.period=10
# mapred.servers=localhost:8649
# Configuration of the "jvm" context for null
#jvm.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "jvm" context for file
#jvm.class=org.apache.hadoop.metrics.file.FileContext
#jvm.period=10
#jvm.fileName=/tmp/jvmmetrics.log
# Configuration of the "jvm" context for ganglia
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# jvm.period=10
# jvm.servers=localhost:8649
# Configuration of the "rpc" context for null
rpc.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "rpc" context for file
#rpc.class=org.apache.hadoop.metrics.file.FileContext
#rpc.period=10
#rpc.fileName=/tmp/rpcmetrics.log
# Configuration of the "rpc" context for ganglia
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# rpc.period=10
# rpc.servers=localhost:8649
# Configuration of the "ugi" context for null
ugi.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "ugi" context for file
#ugi.class=org.apache.hadoop.metrics.file.FileContext
#ugi.period=10
#ugi.fileName=/tmp/ugimetrics.log
# Configuration of the "ugi" context for ganglia
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# ugi.period=10
# ugi.servers=localhost:8649

@ -1,44 +0,0 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# syntax: [prefix].[source|sink].[instance].[options]
# See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
# default sampling period, in seconds
*.period=10
# The namenode-metrics.out will contain metrics from all context
#namenode.sink.file.filename=namenode-metrics.out
# Specifying a special sampling period for namenode:
#namenode.sink.*.period=8
#datanode.sink.file.filename=datanode-metrics.out
# the following example split metrics of different
# context to different sinks (in this case files)
#jobtracker.sink.file_jvm.context=jvm
#jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out
#jobtracker.sink.file_mapred.context=mapred
#jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out
#tasktracker.sink.file.filename=tasktracker-metrics.out
#maptask.sink.file.filename=maptask-metrics.out
#reducetask.sink.file.filename=reducetask-metrics.out

@ -1,57 +0,0 @@
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.blocksize</name>
<value>{{ hadoop['dfs_blocksize'] }}</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>{{ hadoop['dfs_permissions_superusergroup'] }}</value>
</property>
<property>
<name>dfs.namenode.http.address</name>
<value>0.0.0.0:{{ hadoop['dfs_namenode_http_address_port'] }}</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:{{ hadoop['dfs_datanode_address_port'] }}</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:{{ hadoop['dfs_datanode_http_address_port'] }}</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:{{ hadoop['dfs_datanode_ipc_address_port'] }}</value>
</property>
<property>
<name>dfs.replication</name>
<value>{{ hadoop['dfs_replication'] }}</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>{{ hadoop['dfs_namenode_name_dir'] | join(',') }}</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>{{ hadoop['dfs_datanode_data_dir'] | join(',') }}</value>
</property>
</configuration>

@ -1,219 +0,0 @@
# Copyright 2011 The Apache Software Foundation
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Define some default values that can be overridden by system properties
hadoop.root.logger=INFO,console
hadoop.log.dir=.
hadoop.log.file=hadoop.log
# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=${hadoop.root.logger}, EventCounter
# Logging Threshold
log4j.threshold=ALL
# Null Appender
log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
#
# Rolling File Appender - cap space usage at 5gb.
#
hadoop.log.maxfilesize=256MB
hadoop.log.maxbackupindex=20
log4j.appender.RFA=org.apache.log4j.RollingFileAppender
log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
#
# Daily Rolling File Appender
#
log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
# Rollver at midnight
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
# 30-day backup
#log4j.appender.DRFA.MaxBackupIndex=30
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
#
# console
# Add "console" to rootlogger above if you want to use this
#
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
#
# TaskLog Appender
#
#Default values
hadoop.tasklog.taskid=null
hadoop.tasklog.iscleanup=false
hadoop.tasklog.noKeepSplits=4
hadoop.tasklog.totalLogFileSize=100
hadoop.tasklog.purgeLogSplits=true
hadoop.tasklog.logsRetainHours=12
log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
#
# HDFS block state change log from block manager
#
# Uncomment the following to suppress normal block state change
# messages from BlockManager in NameNode.
#log4j.logger.BlockStateChange=WARN
#
#Security appender
#
hadoop.security.logger=INFO,NullAppender
hadoop.security.log.maxfilesize=256MB
hadoop.security.log.maxbackupindex=20
log4j.category.SecurityLogger=${hadoop.security.logger}
hadoop.security.log.file=SecurityAuth-${user.name}.audit
log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
#
# Daily Rolling Security appender
#
log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
#
# hdfs audit logging
#
hdfs.audit.logger=INFO,NullAppender
hdfs.audit.log.maxfilesize=256MB
hdfs.audit.log.maxbackupindex=20
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize}
log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex}
#
# mapred audit logging
#
mapred.audit.logger=INFO,NullAppender
mapred.audit.log.maxfilesize=256MB
mapred.audit.log.maxbackupindex=20
log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize}
log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex}
# Custom Logging levels
#log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
#log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
#log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG
# Jets3t library
log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
#
# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop Metrics.
#
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
#
# Job Summary Appender
#
# Use following logger to send summary to separate file defined by
# hadoop.mapreduce.jobsummary.log.file :
# hadoop.mapreduce.jobsummary.logger=INFO,JSA
#
hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
hadoop.mapreduce.jobsummary.log.maxfilesize=256MB
hadoop.mapreduce.jobsummary.log.maxbackupindex=20
log4j.appender.JSA=org.apache.log4j.RollingFileAppender
log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file}
log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize}
log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex}
log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger}
log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false
#
# Yarn ResourceManager Application Summary Log
#
# Set the ResourceManager summary log filename
#yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
# Set the ResourceManager summary log level and appender
#yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
# Appender for ResourceManager Application Summary Log
# Requires the following properties to be set
# - hadoop.log.dir (Hadoop Log directory)
# - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
# - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
#log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
#log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
#log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
#log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
#log4j.appender.RMSUMMARY.MaxFileSize=256MB
#log4j.appender.RMSUMMARY.MaxBackupIndex=20
#log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
#log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n

@ -1,22 +0,0 @@
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>{{ hostvars[groups['hadoop_masters'][0]]['ansible_hostname'] }}:{{ hadoop['mapred_job_tracker_port'] }}</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>{{ hadoop["mapred_local_dir"] | join(',') }}</value>
</property>
<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:{{ hadoop['mapred_task_tracker_http_address_port'] }}</value>
</property>
<property>
<name>mapred.job.tracker.http.address</name>
<value>0.0.0.0:{{ hadoop['mapred_job_tracker_http_address_port'] }}</value>
</property>
</configuration>

@ -1,3 +0,0 @@
{% for host in groups['hadoop_slaves'] %}
{{ host }}
{% endfor %}

@ -1,80 +0,0 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>ssl.client.truststore.location</name>
<value></value>
<description>Truststore to be used by clients like distcp. Must be
specified.
</description>
</property>
<property>
<name>ssl.client.truststore.password</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.client.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</description>
</property>
<property>
<name>ssl.client.keystore.location</name>
<value></value>
<description>Keystore to be used by clients like distcp. Must be
specified.
</description>
</property>
<property>
<name>ssl.client.keystore.password</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.keystore.keypassword</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
</configuration>

@ -1,77 +0,0 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>ssl.server.truststore.location</name>
<value></value>
<description>Truststore to be used by NN and DN. Must be specified.
</description>
</property>
<property>
<name>ssl.server.truststore.password</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.server.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.server.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</property>
<property>
<name>ssl.server.keystore.location</name>
<value></value>
<description>Keystore to be used by NN and DN. Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.password</name>
<value></value>
<description>Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.keypassword</name>
<value></value>
<description>Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
</configuration>

@ -1,25 +0,0 @@
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://{{ hadoop['nameservice_id'] }}/</value>
</property>
</configuration>

@ -1,75 +0,0 @@
# Configuration of the "dfs" context for null
dfs.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "dfs" context for file
#dfs.class=org.apache.hadoop.metrics.file.FileContext
#dfs.period=10
#dfs.fileName=/tmp/dfsmetrics.log
# Configuration of the "dfs" context for ganglia
# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# dfs.period=10
# dfs.servers=localhost:8649
# Configuration of the "mapred" context for null
mapred.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "mapred" context for file
#mapred.class=org.apache.hadoop.metrics.file.FileContext
#mapred.period=10
#mapred.fileName=/tmp/mrmetrics.log
# Configuration of the "mapred" context for ganglia
# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# mapred.period=10
# mapred.servers=localhost:8649
# Configuration of the "jvm" context for null
#jvm.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "jvm" context for file
#jvm.class=org.apache.hadoop.metrics.file.FileContext
#jvm.period=10
#jvm.fileName=/tmp/jvmmetrics.log
# Configuration of the "jvm" context for ganglia
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# jvm.period=10
# jvm.servers=localhost:8649
# Configuration of the "rpc" context for null
rpc.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "rpc" context for file
#rpc.class=org.apache.hadoop.metrics.file.FileContext
#rpc.period=10
#rpc.fileName=/tmp/rpcmetrics.log
# Configuration of the "rpc" context for ganglia
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# rpc.period=10
# rpc.servers=localhost:8649
# Configuration of the "ugi" context for null
ugi.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "ugi" context for file
#ugi.class=org.apache.hadoop.metrics.file.FileContext
#ugi.period=10
#ugi.fileName=/tmp/ugimetrics.log
# Configuration of the "ugi" context for ganglia
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# ugi.period=10
# ugi.servers=localhost:8649

@ -1,44 +0,0 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# syntax: [prefix].[source|sink].[instance].[options]
# See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
# default sampling period, in seconds
*.period=10
# The namenode-metrics.out will contain metrics from all context
#namenode.sink.file.filename=namenode-metrics.out
# Specifying a special sampling period for namenode:
#namenode.sink.*.period=8
#datanode.sink.file.filename=datanode-metrics.out
# the following example split metrics of different
# context to different sinks (in this case files)
#jobtracker.sink.file_jvm.context=jvm
#jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out
#jobtracker.sink.file_mapred.context=mapred
#jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out
#tasktracker.sink.file.filename=tasktracker-metrics.out
#maptask.sink.file.filename=maptask-metrics.out
#reducetask.sink.file.filename=reducetask-metrics.out

@ -1,103 +0,0 @@
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.nameservices</name>
<value>{{ hadoop['nameservice_id'] }}</value>
</property>
<property>
<name>dfs.ha.namenodes.{{ hadoop['nameservice_id'] }}</name>
<value>{{ groups.hadoop_masters | join(',') }}</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>{{ hadoop['dfs_blocksize'] }}</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>{{ hadoop['dfs_permissions_superusergroup'] }}</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>{{ groups.zookeeper_servers | join(':' ~ hadoop['zookeeper_clientport'] + ',') }}:{{ hadoop['zookeeper_clientport'] }}</value>
</property>
{% for host in groups['hadoop_masters'] %}
<property>
<name>dfs.namenode.rpc-address.{{ hadoop['nameservice_id'] }}.{{ host }}</name>
<value>{{ host }}:{{ hadoop['fs_default_FS_port'] }}</value>
</property>
{% endfor %}
{% for host in groups['hadoop_masters'] %}
<property>
<name>dfs.namenode.http-address.{{ hadoop['nameservice_id'] }}.{{ host }}</name>
<value>{{ host }}:{{ hadoop['dfs_namenode_http_address_port'] }}</value>
</property>
{% endfor %}
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://{{ groups.qjournal_servers | join(':' ~ hadoop['qjournal_port'] + ';') }}:{{ hadoop['qjournal_port'] }}/{{ hadoop['nameservice_id'] }}</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>{{ hadoop['dfs_journalnode_edits_dir'] }}</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.{{ hadoop['nameservice_id'] }}</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true )</value>
</property>
<property>
<name>dfs.ha.zkfc.port</name>
<value>{{ hadoop['dfs_ha_zkfc_port'] }}</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:{{ hadoop['dfs_datanode_address_port'] }}</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:{{ hadoop['dfs_datanode_http_address_port'] }}</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:{{ hadoop['dfs_datanode_ipc_address_port'] }}</value>
</property>
<property>
<name>dfs.replication</name>
<value>{{ hadoop['dfs_replication'] }}</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>{{ hadoop['dfs_namenode_name_dir'] | join(',') }}</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>{{ hadoop['dfs_datanode_data_dir'] | join(',') }}</value>
</property>
</configuration>

@ -1,219 +0,0 @@
# Copyright 2011 The Apache Software Foundation
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Define some default values that can be overridden by system properties
hadoop.root.logger=INFO,console
hadoop.log.dir=.
hadoop.log.file=hadoop.log
# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=${hadoop.root.logger}, EventCounter
# Logging Threshold
log4j.threshold=ALL
# Null Appender
log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
#
# Rolling File Appender - cap space usage at 5gb.
#
hadoop.log.maxfilesize=256MB
hadoop.log.maxbackupindex=20
log4j.appender.RFA=org.apache.log4j.RollingFileAppender
log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
#
# Daily Rolling File Appender
#
log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
# Rollver at midnight
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
# 30-day backup
#log4j.appender.DRFA.MaxBackupIndex=30
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
#
# console
# Add "console" to rootlogger above if you want to use this
#
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
#
# TaskLog Appender
#
#Default values
hadoop.tasklog.taskid=null
hadoop.tasklog.iscleanup=false
hadoop.tasklog.noKeepSplits=4
hadoop.tasklog.totalLogFileSize=100
hadoop.tasklog.purgeLogSplits=true
hadoop.tasklog.logsRetainHours=12
log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
#
# HDFS block state change log from block manager
#
# Uncomment the following to suppress normal block state change
# messages from BlockManager in NameNode.
#log4j.logger.BlockStateChange=WARN
#
#Security appender
#
hadoop.security.logger=INFO,NullAppender
hadoop.security.log.maxfilesize=256MB
hadoop.security.log.maxbackupindex=20
log4j.category.SecurityLogger=${hadoop.security.logger}
hadoop.security.log.file=SecurityAuth-${user.name}.audit
log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
#
# Daily Rolling Security appender
#
log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
#
# hdfs audit logging
#
hdfs.audit.logger=INFO,NullAppender
hdfs.audit.log.maxfilesize=256MB
hdfs.audit.log.maxbackupindex=20
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize}
log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex}
#
# mapred audit logging
#
mapred.audit.logger=INFO,NullAppender
mapred.audit.log.maxfilesize=256MB
mapred.audit.log.maxbackupindex=20
log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize}
log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex}
# Custom Logging levels
#log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
#log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
#log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG
# Jets3t library
log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
#
# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop Metrics.
#
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
#
# Job Summary Appender
#
# Use following logger to send summary to separate file defined by
# hadoop.mapreduce.jobsummary.log.file :
# hadoop.mapreduce.jobsummary.logger=INFO,JSA
#
hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
hadoop.mapreduce.jobsummary.log.maxfilesize=256MB
hadoop.mapreduce.jobsummary.log.maxbackupindex=20
log4j.appender.JSA=org.apache.log4j.RollingFileAppender
log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file}
log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize}
log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex}
log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger}
log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false
#
# Yarn ResourceManager Application Summary Log
#
# Set the ResourceManager summary log filename
#yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
# Set the ResourceManager summary log level and appender
#yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
# Appender for ResourceManager Application Summary Log
# Requires the following properties to be set
# - hadoop.log.dir (Hadoop Log directory)
# - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
# - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
#log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
#log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
#log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
#log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
#log4j.appender.RMSUMMARY.MaxFileSize=256MB
#log4j.appender.RMSUMMARY.MaxBackupIndex=20
#log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
#log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n

@ -1,120 +0,0 @@
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>{{ hadoop['mapred_job_tracker_ha_servicename'] }}</value>
</property>
<property>
<name>mapred.jobtrackers.{{ hadoop['mapred_job_tracker_ha_servicename'] }}</name>
<value>{{ groups['hadoop_masters'] | join(',') }}</value>
<description>Comma-separated list of JobTracker IDs.</description>
</property>
<property>
<name>mapred.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>mapred.ha.zkfc.port</name>
<value>{{ hadoop['mapred_ha_zkfc_port'] }}</value>
</property>
<property>
<name>mapred.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>{{ groups.zookeeper_servers | join(':' ~ hadoop['zookeeper_clientport'] + ',') }}:{{ hadoop['zookeeper_clientport'] }}</value>
</property>
{% for host in groups['hadoop_masters'] %}
<property>
<name>mapred.jobtracker.rpc-address.{{ hadoop['mapred_job_tracker_ha_servicename'] }}.{{ host }}</name>
<value>{{ host }}:{{ hadoop['mapred_job_tracker_port'] }}</value>
</property>
{% endfor %}
{% for host in groups['hadoop_masters'] %}
<property>
<name>mapred.job.tracker.http.address.{{ hadoop['mapred_job_tracker_ha_servicename'] }}.{{ host }}</name>
<value>0.0.0.0:{{ hadoop['mapred_job_tracker_http_address_port'] }}</value>
</property>
{% endfor %}
{% for host in groups['hadoop_masters'] %}
<property>
<name>mapred.ha.jobtracker.rpc-address.{{ hadoop['mapred_job_tracker_ha_servicename'] }}.{{ host }}</name>
<value>{{ host }}:{{ hadoop['mapred_ha_jobtracker_rpc-address_port'] }}</value>
</property>
{% endfor %}
{% for host in groups['hadoop_masters'] %}
<property>
<name>mapred.ha.jobtracker.http-redirect-address.{{ hadoop['mapred_job_tracker_ha_servicename'] }}.{{ host }}</name>
<value>{{ host }}:{{ hadoop['mapred_job_tracker_http_address_port'] }}</value>
</property>
{% endfor %}
<property>
<name>mapred.jobtracker.restart.recover</name>
<value>true</value>
</property>
<property>
<name>mapred.job.tracker.persist.jobstatus.active</name>
<value>true</value>
</property>
<property>
<name>mapred.job.tracker.persist.jobstatus.hours</name>
<value>1</value>
</property>
<property>
<name>mapred.job.tracker.persist.jobstatus.dir</name>
<value>{{ hadoop['mapred_job_tracker_persist_jobstatus_dir'] }}</value>
</property>
<property>
<name>mapred.client.failover.proxy.provider.{{ hadoop['mapred_job_tracker_ha_servicename'] }}</name>
<value>org.apache.hadoop.mapred.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>mapred.client.failover.max.attempts</name>
<value>15</value>
</property>
<property>
<name>mapred.client.failover.sleep.base.millis</name>
<value>500</value>
</property>
<property>
<name>mapred.client.failover.sleep.max.millis</name>
<value>1500</value>
</property>
<property>
<name>mapred.client.failover.connection.retries</name>
<value>0</value>
</property>
<property>
<name>mapred.client.failover.connection.retries.on.timeouts</name>
<value>0</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>{{ hadoop["mapred_local_dir"] | join(',') }}</value>
</property>
<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:{{ hadoop['mapred_task_tracker_http_address_port'] }}</value>
</property>
</configuration>

@ -1,3 +0,0 @@
{% for host in groups['hadoop_slaves'] %}
{{ host }}
{% endfor %}

@ -1,80 +0,0 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>ssl.client.truststore.location</name>
<value></value>
<description>Truststore to be used by clients like distcp. Must be
specified.
</description>
</property>
<property>
<name>ssl.client.truststore.password</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.client.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</description>
</property>
<property>
<name>ssl.client.keystore.location</name>
<value></value>
<description>Keystore to be used by clients like distcp. Must be
specified.
</description>
</property>
<property>
<name>ssl.client.keystore.password</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.keystore.keypassword</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
</configuration>

@ -1,77 +0,0 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>ssl.server.truststore.location</name>
<value></value>
<description>Truststore to be used by NN and DN. Must be specified.
</description>
</property>
<property>
<name>ssl.server.truststore.password</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.server.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.server.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</property>
<property>
<name>ssl.server.keystore.location</name>
<value></value>
<description>Keystore to be used by NN and DN. Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.password</name>
<value></value>
<description>Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.keypassword</name>
<value></value>
<description>Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
</configuration>

@ -1,40 +0,0 @@
# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended_
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
{% if 'hadoop_masters' in group_names %}
-A INPUT -p tcp --dport {{ hadoop['fs_default_FS_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['dfs_namenode_http_address_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['mapred_job_tracker_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['mapred_job_tracker_http_address_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['mapred_ha_jobtracker_rpc-address_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['mapred_ha_zkfc_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['dfs_ha_zkfc_port'] }} -j ACCEPT
{% endif %}
{% if 'hadoop_slaves' in group_names %}
-A INPUT -p tcp --dport {{ hadoop['dfs_datanode_address_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['dfs_datanode_http_address_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['dfs_datanode_ipc_address_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['mapred_task_tracker_http_address_port'] }} -j ACCEPT
{% endif %}
{% if 'qjournal_servers' in group_names %}
-A INPUT -p tcp --dport {{ hadoop['qjournal_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['qjournal_http_port'] }} -j ACCEPT
{% endif %}
{% if 'zookeeper_servers' in group_names %}
-A INPUT -p tcp --dport {{ hadoop['zookeeper_clientport'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['zookeeper_leader_port'] }} -j ACCEPT
-A INPUT -p tcp --dport {{ hadoop['zookeeper_election_port'] }} -j ACCEPT
{% endif %}
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

@ -1,14 +0,0 @@
---
# Handlers for the hadoop master services
- name: restart hadoop master services
service: name=${item} state=restarted
with_items:
- hadoop-0.20-mapreduce-jobtracker
- hadoop-hdfs-namenode
- name: restart hadoopha master services
service: name=${item} state=restarted
with_items:
- hadoop-0.20-mapreduce-jobtrackerha
- hadoop-hdfs-namenode

@ -1,38 +0,0 @@
---
# Playbook for Hadoop master servers
- name: Install the namenode and jobtracker packages
yum: name={{ item }} state=installed
with_items:
- hadoop-0.20-mapreduce-jobtrackerha
- hadoop-hdfs-namenode
- hadoop-hdfs-zkfc
- hadoop-0.20-mapreduce-zkfc
- name: Copy the hadoop configuration files
template: src=roles/common/templates/hadoop_ha_conf/{{ item }}.j2 dest=/etc/hadoop/conf/{{ item }}
with_items:
- core-site.xml
- hadoop-metrics.properties
- hadoop-metrics2.properties
- hdfs-site.xml
- log4j.properties
- mapred-site.xml
- slaves
- ssl-client.xml.example
- ssl-server.xml.example
notify: restart hadoopha master services
- name: Create the data directory for the namenode metadata
file: path={{ item }} owner=hdfs group=hdfs state=directory
with_items: hadoop.dfs_namenode_name_dir
- name: Create the data directory for the jobtracker ha
file: path={{ item }} owner=mapred group=mapred state=directory
with_items: hadoop.mapred_job_tracker_persist_jobstatus_dir
- name: Format the namenode
shell: creates=/usr/lib/hadoop/namenode.formatted su - hdfs -c "hadoop namenode -format" && touch /usr/lib/hadoop/namenode.formatted
- name: start hadoop namenode services
service: name=hadoop-hdfs-namenode state=started

@ -1,38 +0,0 @@
---
# Playbook for Hadoop master servers
- name: Install the namenode and jobtracker packages
yum: name={{ item }} state=installed
with_items:
- hadoop-0.20-mapreduce-jobtracker
- hadoop-hdfs-namenode
- name: Copy the hadoop configuration files for no ha
template: src=roles/common/templates/hadoop_conf/{{ item }}.j2 dest=/etc/hadoop/conf/{{ item }}
with_items:
- core-site.xml
- hadoop-metrics.properties
- hadoop-metrics2.properties
- hdfs-site.xml
- log4j.properties
- mapred-site.xml
- slaves
- ssl-client.xml.example
- ssl-server.xml.example
notify: restart hadoop master services
- name: Create the data directory for the namenode metadata
file: path={{ item }} owner=hdfs group=hdfs state=directory
with_items: hadoop.dfs_namenode_name_dir
- name: Format the namenode
shell: creates=/usr/lib/hadoop/namenode.formatted su - hdfs -c "hadoop namenode -format" && touch /usr/lib/hadoop/namenode.formatted
- name: start hadoop namenode services
service: name=hadoop-hdfs-namenode state=started
- name: Give permissions for mapred users
shell: creates=/usr/lib/hadoop/namenode.initialized su - hdfs -c "hadoop fs -chown hdfs:hadoop /"; su - hdfs -c "hadoop fs -chmod 0775 /" && touch /usr/lib/hadoop/namenode.initialized
- name: start hadoop jobtracker services
service: name=hadoop-0.20-mapreduce-jobtracker state=started

@ -1,9 +0,0 @@
---
# Playbook for Hadoop master primary servers
- include: hadoop_master.yml
when: ha_enabled
- include: hadoop_master_no_ha.yml
when: not ha_enabled

@ -1,14 +0,0 @@
---
# Handlers for the hadoop master services
- name: restart hadoop master services
service: name=${item} state=restarted
with_items:
- hadoop-0.20-mapreduce-jobtracker
- hadoop-hdfs-namenode
- name: restart hadoopha master services
service: name=${item} state=restarted
with_items:
- hadoop-0.20-mapreduce-jobtrackerha
- hadoop-hdfs-namenode

@ -1,64 +0,0 @@
---
# Playbook for Hadoop master secondary server
- name: Install the namenode and jobtracker packages
yum: name=${item} state=installed
with_items:
- hadoop-0.20-mapreduce-jobtrackerha
- hadoop-hdfs-namenode
- hadoop-hdfs-zkfc
- hadoop-0.20-mapreduce-zkfc
- name: Copy the hadoop configuration files
template: src=roles/common/templates/hadoop_ha_conf/{{ item }}.j2 dest=/etc/hadoop/conf/{{ item }}
with_items:
- core-site.xml
- hadoop-metrics.properties
- hadoop-metrics2.properties
- hdfs-site.xml
- log4j.properties
- mapred-site.xml
- slaves
- ssl-client.xml.example
- ssl-server.xml.example
notify: restart hadoopha master services
- name: Create the data directory for the namenode metadata
file: path={{ item }} owner=hdfs group=hdfs state=directory
with_items: hadoop.dfs_namenode_name_dir
- name: Create the data directory for the jobtracker ha
file: path={{ item }} owner=mapred group=mapred state=directory
with_items: hadoop.mapred_job_tracker_persist_jobstatus_dir
- name: Initialize the secodary namenode
shell: creates=/usr/lib/hadoop/namenode.formatted su - hdfs -c "hadoop namenode -bootstrapStandby" && touch /usr/lib/hadoop/namenode.formatted
- name: start hadoop namenode services
service: name=hadoop-hdfs-namenode state=started
- name: Initialize the zkfc for namenode
shell: creates=/usr/lib/hadoop/zkfc.formatted su - hdfs -c "hdfs zkfc -formatZK" && touch /usr/lib/hadoop/zkfc.formatted
- name: start zkfc for namenodes
service: name=hadoop-hdfs-zkfc state=started
delegate_to: ${item}
with_items: groups.hadoop_masters
- name: Give permissions for mapred users
shell: creates=/usr/lib/hadoop/fs.initialized su - hdfs -c "hadoop fs -chown hdfs:hadoop /"; su - hdfs -c "hadoop fs -chmod 0774 /" && touch /usr/lib/hadoop/namenode.initialized
- name: Initialize the zkfc for jobtracker
shell: creates=/usr/lib/hadoop/zkfcjob.formatted su - mapred -c "hadoop mrzkfc -formatZK" && touch /usr/lib/hadoop/zkfcjob.formatted
- name: start zkfc for jobtracker
service: name=hadoop-0.20-mapreduce-zkfc state=started
delegate_to: '{{ item }}'
with_items: groups.hadoop_masters
- name: start hadoop Jobtracker services
service: name=hadoop-0.20-mapreduce-jobtrackerha state=started
delegate_to: '{{ item }}'
with_items: groups.hadoop_masters

@ -1,8 +0,0 @@
---
# Handlers for the hadoop slave services
- name: restart hadoop slave services
service: name=${item} state=restarted
with_items:
- hadoop-0.20-mapreduce-tasktracker
- hadoop-hdfs-datanode

@ -1,4 +0,0 @@
---
# Playbook for Hadoop slave servers
- include: slaves.yml tags=slaves

@ -1,53 +0,0 @@
---
# Playbook for Hadoop slave servers
- name: Install the datanode and tasktracker packages
yum: name=${item} state=installed
with_items:
- hadoop-0.20-mapreduce-tasktracker
- hadoop-hdfs-datanode
- name: Copy the hadoop configuration files
template: src=roles/common/templates/hadoop_ha_conf/${item}.j2 dest=/etc/hadoop/conf/${item}
with_items:
- core-site.xml
- hadoop-metrics.properties
- hadoop-metrics2.properties
- hdfs-site.xml
- log4j.properties
- mapred-site.xml
- slaves
- ssl-client.xml.example
- ssl-server.xml.example
when: ha_enabled
notify: restart hadoop slave services
- name: Copy the hadoop configuration files for non ha
template: src=roles/common/templates/hadoop_conf/${item}.j2 dest=/etc/hadoop/conf/${item}
with_items:
- core-site.xml
- hadoop-metrics.properties
- hadoop-metrics2.properties
- hdfs-site.xml
- log4j.properties
- mapred-site.xml
- slaves
- ssl-client.xml.example
- ssl-server.xml.example
when: not ha_enabled
notify: restart hadoop slave services
- name: Create the data directory for the slave nodes to store the data
file: path={{ item }} owner=hdfs group=hdfs state=directory
with_items: hadoop.dfs_datanode_data_dir
- name: Create the data directory for the slave nodes for mapreduce
file: path={{ item }} owner=mapred group=mapred state=directory
with_items: hadoop.mapred_local_dir
- name: start hadoop slave services
service: name={{ item }} state=started
with_items:
- hadoop-0.20-mapreduce-tasktracker
- hadoop-hdfs-datanode

@ -1,5 +0,0 @@
---
# The journal node handlers
- name: restart qjournal services
service: name=hadoop-hdfs-journalnode state=restarted

@ -1,22 +0,0 @@
---
# Playbook for the qjournal nodes
- name: Install the qjournal package
yum: name=hadoop-hdfs-journalnode state=installed
- name: Create folder for Journaling
file: path={{ hadoop.dfs_journalnode_edits_dir }} state=directory owner=hdfs group=hdfs
- name: Copy the hadoop configuration files
template: src=roles/common/templates/hadoop_ha_conf/{{ item }}.j2 dest=/etc/hadoop/conf/{{ item }}
with_items:
- core-site.xml
- hadoop-metrics.properties
- hadoop-metrics2.properties
- hdfs-site.xml
- log4j.properties
- mapred-site.xml
- slaves
- ssl-client.xml.example
- ssl-server.xml.example
notify: restart qjournal services

@ -1,5 +0,0 @@
---
# Handler for the zookeeper services
- name: restart zookeeper
service: name=zookeeper-server state=restarted

@ -1,13 +0,0 @@
---
# The plays for zookeper daemons
- name: Install the zookeeper files
yum: name=zookeeper-server state=installed
- name: Copy the configuration file for zookeeper
template: src=zoo.cfg.j2 dest=/etc/zookeeper/conf/zoo.cfg
notify: restart zookeeper
- name: initialize the zookeper
shell: creates=/var/lib/zookeeper/myid service zookeeper-server init --myid=${zoo_id}

@ -1,9 +0,0 @@
tickTime=2000
dataDir=/var/lib/zookeeper/
clientPort={{ hadoop['zookeeper_clientport'] }}
initLimit=5
syncLimit=2
{% for host in groups['zookeeper_servers'] %}
server.{{ hostvars[host].zoo_id }}={{ host }}:{{ hadoop['zookeeper_leader_port'] }}:{{ hadoop['zookeeper_election_port'] }}
{% endfor %}

@ -1,28 +0,0 @@
---
# The main playbook to deploy the site
- hosts: hadoop_all
roles:
- common
- hosts: zookeeper_servers
roles:
- { role: zookeeper_servers, when: ha_enabled }
- hosts: qjournal_servers
roles:
- { role: qjournal_servers, when: ha_enabled }
- hosts: hadoop_master_primary
roles:
- { role: hadoop_primary }
- hosts: hadoop_master_secondary
roles:
- { role: hadoop_secondary, when: ha_enabled }
- hosts: hadoop_slaves
roles:
- { role: hadoop_slaves }