Category: DevOps


Real World Vagrant – Build an Apache Spark Development Env!

By Toyin Akin,

Real World Vagrant - Build an Apache Spark Development Env!

With a single command, build an IDE, Scala and Spark (1.6.2 or 2.0.1) Development Environment! Run in under 3 minutes!!

Course Access


You can access all the Big Data / Spark courses for one low monthly fee. Currently the membership site houses courses that covers deploying Hadoop with Cloudera and Hortonworks as well as installing and working with Spark 2.0.

This course can be purchased from


Note : This course is built on top of the “Real World Vagrant For Distributed Computing – Toyin Akin” course

This course enables you to package a complete Spark Development environment into your own custom 2.3GB vagrant box.

Once built you no longer need to manipulate your Windows machine in order to get a fully fledged Spark environment to work. With the final solution, you can boot up a complete Apache Spark environment in under 3 minutes!!

Install any version of Spark you prefer. We have codified for 1.6.2 or 2.0.1. but it’s pretty easy to extend this for a new version.

Why Apache Spark …

Apache Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.
Apache Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells.
Apache Spark can combine SQL, streaming, and complex analytics.
Apache Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.


Recommended Spark course path. If you already have spark installed, you do not need to access the first three courses

Spark.Courses

Real World Vagrant – Hortonworks Data Platform 2.5

By Toyin Akin,

Real World Vagrant - Hortonworks Data Platform 2.5

Build a Distributed Cluster of Hortonworks 2.5 Manager and Agent nodes with a single command! Includes Spark 2.0!

Course Access


You can access all the Big Data / Spark courses for one low monthly fee. Currently the membership site houses courses that covers deploying Hadoop with Cloudera and Hortonworks as well as installing and working with Spark 2.0.

This course can be purchased from


Note : This course is built on top of the “Real World Vagrant For Distributed Computing – Toyin Akin” course

NoSQL“, “Big Data“, “DevOps” and “In Memory Database
technology are a hot and highly valuable skill to have – and this
course will teach you how to quickly create a distributed environment
for you to deploy these technologies on.

A combination of VirtualBox and Vagrant will transform your desktop machine into a virtual cluster. However this needs to be configured correctly. Simply enabling multinode within Vagrant is not good enough. It needs to be tuned. Developers and Operators within large enterprises, including investment banks, all use Vagrant to simulate Production environments.

After all, if you are developing against or operating a distributed environment, it needs to be tested. Tested in terms of code deployed and the deployment code itself.

You’ll learn the same techniques these enterprise guys use on your own Microsoft Windows computer/laptop.

Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.

This course will use VirtualBox to carve out your virtual environment. However the same skills learned with Vagrant can be used to provision virtual machines on VMware, AWS, or any other provider.

If you are a developer, this course will help you will isolate dependencies and their configuration within a single disposable, consistent environment, without sacrificing any of the tools you are used to working with (editors, browsers, debuggers, etc.). Once you or someone else creates a single Vagrantfile, you just need to vagrant up and everything is installed and configured for you to work. Other members of your team create their development environments from the same configuration. Say goodbye to “works on my machine” bugs.

If you are an operations engineer, this course will help you build a disposable environment and consistent workflow for developing and testing infrastructure management scripts. You can quickly test your deployment scripts and more using local virtualization such as VirtualBox or VMware. (VirtualBox for this course). Ditch your custom scripts to recycle EC2 instances, stop juggling SSH prompts to various machines, and start using Vagrant to bring sanity to your life.

If you are a designer, this course will help you with distributed installation of software in order for you to focus on doing what you do best: design. Once a developer configures Vagrant, you do not need to worry about how to get that software running ever again. No more bothering other developers to help you fix your environment so you can test designs. Just check out the code, vagrant up, and start designing.


Recommended Hortonworks curriculum path.

HDP.Courses

 

Real World Vagrant – Automate a Cloudera Manager Build

By Toyin Akin,

Real World Vagrant - Automate a Cloudera Manager Build

Build a Distributed Cluster of Cloudera Manager and any number of Cloudera Manager Agent nodes with a single command!

Course Access


You can access all the Big Data / Spark courses for one low monthly fee. Currently the membership site houses courses that covers deploying Hadoop with Cloudera and Hortonworks as well as installing and working with Spark 2.0.

This course can be purchased from


Note : This course is built on top of the “Real World Vagrant For Distributed Computing – Toyin Akin” course

NoSQL“, “Big Data“, “DevOps” and “In Memory Database” technology are a hot and highly valuable skill to have – and this course will teach you how to quickly create a distributed environment for you to deploy these technologies on.

A combination of VirtualBox and Vagrant will transform your desktop machine into a virtual cluster. However this needs to be configured correctly. Simply enabling multinode within Vagrant is not good enough. It needs to be tuned. Developers and Operators within large enterprises, including investment banks, all use Vagrant to simulate Production environments.

After all, if you are developing against or operating a distributed environment, it needs to be tested. Tested in terms of code deployed and the deployment code itself.

You’ll learn the same techniques these enterprise guys use on your own Microsoft Windows computer/laptop.

Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.

This course will use VirtualBox to carve out your virtual environment. However the same skills learned with Vagrant can be used to provision virtual machines on VMware, AWS, or any other provider.

If you are a developer, this course will help you will isolate dependencies and their configuration within a single disposable, consistent environment, without sacrificing any of the tools you are used to working with (editors, browsers, debuggers, etc.). Once you or someone else creates a single Vagrantfile, you just need to vagrant up and everything is installed and configured for you to work. Other members of your team create their development environments from the same configuration. Say goodbye to “works on my machine” bugs.

If you are an operations engineer, this course will help you build a disposable environment and consistent workflow for developing and testing infrastructure management scripts. You can quickly test your deployment scripts and more using local virtualization such as VirtualBox or VMware. (VirtualBox for this course). Ditch your custom scripts to recycle EC2 instances, stop juggling SSH prompts to various machines, and start using Vagrant to bring sanity to your life.

If you are a designer, this course will help you with distributed installation of software in order for you to focus on doing what you do best: design. Once a developer configures Vagrant, you do not need to worry about how to get that software running ever again. No more bothering other developers to help you fix your environment so you can test designs. Just check out the code, vagrant up, and start designing.


Recommended Cloudera Manager curriculum path. If you already have Cloudera Manager installed, you do not need to access the first three courses

Cloudera.Courses

 

Real World Hadoop – Deploying Hadoop with Cloudera Manager

By Toyin Akin,

Real World Hadoop - Deploying Hadoop with Cloudera Manager

Move to the next step from the Cloudera QuickStart VM. Install a DEV Hadoop Environment with Enterprise Tooling

Course Access


You can access all the Big Data / Spark courses for one low monthly fee. Currently the membership site houses courses that covers deploying Hadoop with Cloudera and Hortonworks as well as installing and working with Spark 2.0.

This course can be purchased from


Big Data” technology is a hot and highly valuable skill to have – and this course will teach you how to quickly deploy a Hadoop Cluster using the Cloudera stack.

Cloudera allows you to download a QuickStart Virtual machine which is great for developers, but this is of no use for the Operations team to start the planning and the building out of DEV / UAT and PROD environments within their organizations. What assumptions were made when the QuickStart VM was put together?

In addition, hosting all of Cloudera’s processes as well as Hadoop’s processes on one VM is not a model that any large organization can or should follow. The Hadoop services need to be split out across multiple VMs/Servers. In fact that’s the whole point out Hadoop!

Distributed Data and Distributed Compute.

After all, if you are developing against or operating a distributed environment, it needs to be tested. Tested in terms of the forcing various failure modes within the cluster and ensuing that the cluster can still respond to user requests. Killing the QuickStart VM destroys the entire cluster!

You’ll learn the same techniques these large enterprise guys use to move to the next step in building out an enterprise grade Hadoop cluster.

If you are a developer, the operations team can build out that centralized cluster in which you are truly testing against a distributed cluster. Testing code against the Quickstart VM may work, but as any experienced distributed developer knows, verifying code against a pseudo cluster on a single machine is different than verifying against code against a truly distributed cluster.

As an example bottlenecks in Networks or CPU cycles will come to light. In addition, this will also assist in capacity planing of the UAT / PROD cluster as initial metrics can be acquired.

If you are in operations then this gives the operations team an environment for the team to start learning how to jointly operate the cluster. Here the team can start to understand cluster metrics, adding/removing cluster nodes, managing the various Hadoop services (Zookeeper, HDFS, YARN and Spark) and a lot more. We also look at managing Cloudera Hadoop Parcels as well as changing Hadoop versions once a cluster is deployed.

The operation team can start to develop procedures and change management documentation ready for Production operation of a Hadoop cluster.


Recommended Cloudera Manager curriculum path. If you already have Cloudera Manager installed, you do not need to access the first three courses

Cloudera.Courses

 

Real World Hadoop – Automating Hadoop install with Python!

By Toyin Akin,

Real World Hadoop - Automating Hadoop install with Python!

Deploy a Hadoop cluster (Zookeeper, HDFS, YARN, Spark) with Cloudera Manager’s Python API. Hands on.

Course Access


You can access all the Big Data / Spark courses for one low monthly fee. Currently the membership site houses courses that covers deploying Hadoop with Cloudera and Hortonworks as well as installing and working with Spark 2.0.

This course can be purchased from


Note : This course is built on top of the “Real World Vagrant – Automate a Cloudera Manager Build – Toyin Akin” course

Deploy a Hadoop cluster (Zookeeper, HDFS, YARN, Spark) with Python! Instruct Cloudera Manager to do the work! Hands on. Here we use Python to instruct an already installed Cloudera Manager to deploy your Hadoop Services.

.The Cloudera Manager API provides configuration and service lifecycle management, service health information and metrics, and allows you to configure Cloudera Manager itself. The API is served on the same host and port as the Cloudera Manager Admin Console, and does not require an extra process or extra configuration. The API supports HTTP Basic Authentication, accepting the same users and credentials as the Cloudera Manager Admin Console.

.

Here are some of the cool things you can do with Cloudera Manager via the API:

Deploy an entire Hadoop cluster programmatically. Cloudera Manager supports HDFS, MapReduce, YARN, ZooKeeper, HBase, Hive, Oozie, Hue, Flume, Impala, Solr, Sqoop, Spark and Accumulo.
Configure various Hadoop services and get config validation.
Take admin actions on services and roles, such as start, stop, restart, failover, etc. Also available are the more advanced workflows, such as setting up high availability and decommissioning.
Monitor your services and hosts, with intelligent service health checks and metrics.
Monitor user jobs and other cluster activities.
Retrieve timeseries metric data.
Search for events in the Hadoop system.
Administer Cloudera Manager itself.
Download the entire deployment description of your Hadoop cluster in a json file.

Additionally, with the appropriate licenses, the API lets you:

Perform rolling restart and rolling upgrade.
Audit user activities and accesses in Hadoop.
Perform backup and cross data-center replication for HDFS and Hive.
Retrieve per-user HDFS usage report and per-user MapReduce resource usage report.


Recommended Cloudera Manager curriculum path. If you already have Cloudera Manager installed, you do not need to access the first three courses

Cloudera.Courses