Category: Development

Real World C# Quant Developer – Bond Risks / Reports

By Toyin Akin, 26th December 2016

Note : This course is built on top of the “Real World Spark 2 – ScalaIDE Spark Core 2 Developer – Toyin Akin” course

Scala IDE provides advanced editing and debugging support for the development of pure Scala and mixed Scala-Java applications.

Now with a shiny Scala debugger, semantic highlight, more reliable JUnit test finder, an ecosystem of related plugins, and much more.

Scala Debugger. Stepping through closures and Scala-aware display of debugging information.

Spark Monitoring and Instrumentation

While creating RDDs, performing transformations and executing actions, you will be working heavily within the monitoring view of the Web UI.

Every SparkContext launches a web UI, by default on port 4040, that displays useful information about the application. This includes:

A list of scheduler stages and tasks A summary of RDD sizes and memory usage Environmental information. Information about the running executors

Why Apache Spark …

Apache Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. Apache Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Apache Spark can combine SQL, streaming, and complex analytics.

Apache Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Real World C# Quant Developer – Bond Pricing

By Toyin Akin, 26th December 2016

Note : This course is built on top of the “Real World Spark 2 – Jupyter Python Spark Core – Toyin Akin” course

Jupyter Notebook is a system similar to Mathematica that allows you to create “executable documents”. Notebooks integrate formatted text (Markdown), executable code (Python), mathematical formulas (LaTeX), and graphics and visualizations (matplotlib) into a single document that captures the flow of an exploration and can be exported as a formatted report or an executable script.,

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

Big data integration

Leverage big data tools, such as Apache Spark, from Python

The Jupyter Notebook is based on a set of open standards for interactive computing. Think HTML and CSS for interactive computing on the web. These open standards can be leveraged by third party developers to build customized applications with embedded interactive computing.

Spark Monitoring and Instrumentation

While creating RDDs, performing transformations and executing actions, you will be working heavily within the monitoring view of the Web UI.

Every SparkContext launches a web UI, by default on port 4040, that displays useful information about the application. This includes:

A list of scheduler stages and tasks A summary of RDD sizes and memory usage Environmental information. Information about the running executors

Why Apache Spark …

Apache Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Real World C# Quant Developer – Bond Product

By Toyin Akin, 26th December 2016

Note : This course is built on top of the “Real World Spark 2 – Jupyter Scala Spark Core – Toyin Akin” course

Jupyter Notebook is a system similar to Mathematica that allows you to create “executable documents”. Notebooks integrate formatted text (Markdown), executable code (Scala),

Big data integration

Leverage big data tools, such as Apache Spark, from Scala

Spark Monitoring and Instrumentation

While creating RDDs, performing transformations and executing actions, you will be working heavily within the monitoring view of the Web UI.

Every SparkContext launches a web UI, by default on port 4040, that displays useful information about the application. This includes:

A list of scheduler stages and tasks A summary of RDD sizes and memory usage Environmental information. Information about the running executors

Why Apache Spark …

Apache Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Real World Hadoop – Automating Hadoop install with Python!

By Toyin Akin, 26th December 2016

Deploy a Hadoop cluster (Zookeeper, HDFS, YARN, Spark) with Cloudera Manager’s Python API. Hands on.

Course Access

You can access all the Big Data / Spark courses for one low monthly fee. Currently the membership site houses courses that covers deploying Hadoop with Cloudera and Hortonworks as well as installing and working with Spark 2.0.

POC-d membership site : POC-D Membership site

This course can be purchased from

Udemy : POC-D Udemy BigData/Spark courses

Note : This course is built on top of the “Real World Vagrant – Automate a Cloudera Manager Build – Toyin Akin” course

Deploy a Hadoop cluster (Zookeeper, HDFS, YARN, Spark) with Python! Instruct Cloudera Manager to do the work! Hands on. Here we use Python to instruct an already installed Cloudera Manager to deploy your Hadoop Services.

.The Cloudera Manager API provides configuration and service lifecycle management, service health information and metrics, and allows you to configure Cloudera Manager itself. The API is served on the same host and port as the Cloudera Manager Admin Console, and does not require an extra process or extra configuration. The API supports HTTP Basic Authentication, accepting the same users and credentials as the Cloudera Manager Admin Console.

Here are some of the cool things you can do with Cloudera Manager via the API:

Deploy an entire Hadoop cluster programmatically. Cloudera Manager supports HDFS, MapReduce, YARN, ZooKeeper, HBase, Hive, Oozie, Hue, Flume, Impala, Solr, Sqoop, Spark and Accumulo.
Configure various Hadoop services and get config validation.
Take admin actions on services and roles, such as start, stop, restart, failover, etc. Also available are the more advanced workflows, such as setting up high availability and decommissioning.
Monitor your services and hosts, with intelligent service health checks and metrics.
Monitor user jobs and other cluster activities.
Retrieve timeseries metric data.
Search for events in the Hadoop system.
Administer Cloudera Manager itself.
Download the entire deployment description of your Hadoop cluster in a json file.

Additionally, with the appropriate licenses, the API lets you:

Perform rolling restart and rolling upgrade.
Audit user activities and accesses in Hadoop.
Perform backup and cross data-center replication for HDFS and Hive.
Retrieve per-user HDFS usage report and per-user MapReduce resource usage report.

Category: Development

Course Access

Recommended Cloudera Manager curriculum path. If you already have Cloudera Manager installed, you do not need to access the first three courses