Excel VBA Python SQL Statistics Classes Python Data Science Machine Learning Bootcamp NYC New York : Big Data hadoop part 1

Tuesday, June 12, 2018

Big Data hadoop part 1

Bring something from hadoop fs and then do spark

https://spark.apache.org/examples.html

https://www.cloudera.com/more/training/courses/developer-training-for-spark-and-hadoop.html

Cloudera hadoop on digital ocean:
https://www.youtube.com/watch?v=Q2F_2tCFIMw

Spark SQL

Spark Streaming to process a live data stream

Python Map reduce program:

Hive create tables

Impala, Partitioning in Hive, re-doing partitioning local and external, file formats - tab

Scoop and Hive

Spark Streaming API can consume from sources like Kafka ,Flume, Twitter source to name a few. It can then apply transformations on the data to get the desired result which can be pushed further downstream.

Connecting Kafka with Streaming Spark API

http://www.science.smith.edu/dftwiki/index.php/Hadoop_Tutorial_1_--_Running_WordCount

import java.io.IOException;
import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93); if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); } } }

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

New York Python SQL Bootcamp Coding Classes (Affordable & Cost-effective Machine Learning)

The most affordable and cost effective Machine Learning and Artificial Intelligence Bootcamp! Support available from 9 am - 9 pm in campus. Spend 100 hours with the chief instructor in class (no remote or online sessions)! Lowest per hour rate in NYC!

We use MS Azure Noteboooks, AWS Sagemaker, Github, Slack along with games and quizzes to make learning fun.

New York Python SQL Bootcamp Coding Classes (Affordable & Cost-effective Machine Learning). Best Free classes in NYC. SQL 101 & Python 101 Classes. Big Data Science Classes for beginners interested in Analytics & Data Science. Weekend part time and full time classes in Manhattan & Queens. 1 on 1 Tutoring also available. Free weekend 2hrs class. Small group courses (2-3 attendees), free takes and 1 on 1 : Python 101, Python Data Science Immersive Python for Data Analytics. VBA Macros Immersive. SQL 1 day Class.

Project and Portfolio Oriented on weekends and also free evening classes in NYC. Upload your portfolio to get better job. Best Python Class in NYC. FREE RETAKES.

Python 101 and Object Oriented Python Advanced Python 102 SQL Basics
Machine Learning 101 Class Fundamentals Scikit Learn
EDA Charting using Matplot, Seabourne and Pyplot Python Analytics 101
Python Pandas for Analytics (SQL and Excel equivalence) 101
Regression and Logistic Regression - Python and the Math Behind 101
SV, Stochastic Gradient Descent, Naive Bayes Classification
Decision Trees and Random Forest Ensemble Models Class 101
Unsupervised Learning Clustering K-means Neural Network Class 101
Dimension Reduction using PCA, Lasso and Ridge Class 101
Big Data Hadoop Spark Mapreduce Class 101
Natural Language Processing Class 101
Web Scraping Python Class using beautifulsoup and selenium web driver 101
Tensor Flow and Keras Neural Network Class 101
Django 101, Flask Python for Web Development Class 101