Bring something from hadoop fs and then do spark
https://spark.apache.org/examples.html
https://www.cloudera.com/more/training/courses/developer-training-for-spark-and-hadoop.html
Cloudera hadoop on digital ocean:
https://www.youtube.com/watch?v=Q2F_2tCFIMw
Spark SQL
Spark Streaming to process a live data stream
Python Map reduce program:
Hive create tables
Impala, Partitioning in Hive, re-doing partitioning local and external, file formats - tab
Scoop and Hive
Spark Streaming API can consume from sources like Kafka ,Flume, Twitter source to name a few. It can then apply transformations on the data to get the desired result which can be pushed further downstream.
Connecting Kafka with Streaming Spark API
http://www.science.smith.edu/dftwiki/index.php/Hadoop_Tutorial_1_--_Running_WordCount
import java.io.IOException;
import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93); if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); } } }
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.