Excel VBA Python SQL Statistics Classes Python Data Science Machine Learning Bootcamp NYC New York : Big Data in Data Science

Thursday, December 14, 2017

Big Data in Data Science

Tools and plays

Kafka, Elastic Map Reduce, Avro, Parque, Storm, Hbase

NodejS or Java
- Either:

Kafka, Storm, Neo4j or Hbase
- Mongoose
- Solr/Lucene

Cassandra, Spark

Deep working experience applying machine learning and statistics to real world problems
Solid understanding of a wide range of data mining / machine learning software packages (e.g., Spark ML, scikit-learn, H2O, Weka, Keras)
Experience with version control systems (git) and comfortable using command-line tools

Preferred:
Knowledge of semantic web technology (e.g., RDF, OWL, SPARQL)
Knowledge of search technologies (e.g., Solr, ElasticSearch)
A link to a portfolio and/or code samples demonstrating your work experience (GitHub, Kaggle, KDD contributions earn major props)

Data Analyst – BI - Training:

Coding data extraction, transformation and loading (ETL) routines.
APIs and databases to pull data together

Hadoop, SQL and NoSQL technologies is required, as well as basic scripting experience in a dynamic language, such as Python or R.
Tools like Jethro, Kyvos, Dremio, AtScale etc.
BI tools like Tableau, Domo, Qlikview etc.
Sata visualization
Relational Databases (eg., Postgres, SQL Server, Oracle, MySQL)
Distributed Databases (eg., Hive, Redshift, Greenplum)
NoSQL Data Frameworks (eg., Spark, Mongo, Cassandra, HBase)
Data Analysis and Transformation (eg., R, Matlab, Python, etc.)

Big Data providers: Cloudera CDH, Hortonworks HDP and Amazon EC2/EMR for deploying and developing large scale solutions.
Hadoop/Spark Big Data Environment Clusters using Foreman, Puppet and Vagrant. Deploy Big Data Platforms (including Hadoop & Spark) to multiple clusters using Cloudera CDH, on both CDH4 and CDH5.
Hadoop MapReduce, YARN, HBase, Spark performance for large-scale data analysis.
Spark performance based on Cloudera and Hortornworks HDP cluster setup in Production Server.
Machine learning data models on Terabytes of data using Spark Ml and Mlib libraries.
ETL systems using Python, HIVE and Apache spark SQL framework. Storing all the result files in Apache parquet and mapping them to HIVE for Enterprise Datawarehousing.
Real-time data pipelines using Kafka and Python consumers to ingest data through Adobe Real-time Firehorse API into Elastic Search and built real-time dashboards using Kibana.
Aribnb Airflow tool, to run the machine learning scripts in a DAG manner.
Test cases using Python Nose framework.
Scikit learn python scripts to Ml\Mlib spark scripts, which resulted to scalable pipeline framework computing.
PySpark.
Data Pipelines using Spark and Scala on AWS EMR framework and S3.
Real-time Data pipelines using Spark Streaming and Apache Kafka in Python.
Real-time Data pipelines using Apache Storm Java API for processing live streams of data and ingesting to Hbase.
Data pipelines on Cloudera/Hortornworks Hadoop Platform using Apache PIG and automating workflow using Apache Oozie.

Technology: Hadoop Ecosystem /Spring Boot/Microservices/AWS /J2SE/J2EE/Oracle
DBMS/Databases: DB2, My SQL, SQL, PL/SQL
Big Data Ecosystem: HDFS, Map Reduce, Oozie, Hive/Impala, Pig, Sqoop, Zookeeper and Hbase,
Spark, Scala
NOSQL Databases: Mongo DB, Hbase
Version Control Tools: SVN, CVS, VSS, PVCS

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

New York Python SQL Bootcamp Coding Classes (Affordable & Cost-effective Machine Learning)

The most affordable and cost effective Machine Learning and Artificial Intelligence Bootcamp! Support available from 9 am - 9 pm in campus. Spend 100 hours with the chief instructor in class (no remote or online sessions)! Lowest per hour rate in NYC!

We use MS Azure Noteboooks, AWS Sagemaker, Github, Slack along with games and quizzes to make learning fun.

New York Python SQL Bootcamp Coding Classes (Affordable & Cost-effective Machine Learning). Best Free classes in NYC. SQL 101 & Python 101 Classes. Big Data Science Classes for beginners interested in Analytics & Data Science. Weekend part time and full time classes in Manhattan & Queens. 1 on 1 Tutoring also available. Free weekend 2hrs class. Small group courses (2-3 attendees), free takes and 1 on 1 : Python 101, Python Data Science Immersive Python for Data Analytics. VBA Macros Immersive. SQL 1 day Class.

Project and Portfolio Oriented on weekends and also free evening classes in NYC. Upload your portfolio to get better job. Best Python Class in NYC. FREE RETAKES.

Python 101 and Object Oriented Python Advanced Python 102 SQL Basics
Machine Learning 101 Class Fundamentals Scikit Learn
EDA Charting using Matplot, Seabourne and Pyplot Python Analytics 101
Python Pandas for Analytics (SQL and Excel equivalence) 101
Regression and Logistic Regression - Python and the Math Behind 101
SV, Stochastic Gradient Descent, Naive Bayes Classification
Decision Trees and Random Forest Ensemble Models Class 101
Unsupervised Learning Clustering K-means Neural Network Class 101
Dimension Reduction using PCA, Lasso and Ridge Class 101
Big Data Hadoop Spark Mapreduce Class 101
Natural Language Processing Class 101
Web Scraping Python Class using beautifulsoup and selenium web driver 101
Tensor Flow and Keras Neural Network Class 101
Django 101, Flask Python for Web Development Class 101