Thursday, June 14, 2018

Advanced sQL

https://staff.brighton.ac.uk/is/Published%20Documents/Excel%20Formulae%20and%20Functions%20(PC)%20QRC.pdf

Takeaways
The workshop will include the following topics:
Build SQL Database in the Cloud (using Amazon Web Services)
Import and Export of Data
Create and Manage Users
Advanced Querying Techniques (e.g., case statements, extract, union)

Python advanced course

Meta Classes

Abstract Class

Unit Testing Python

Global setting why python is bad for multi threading

global optimization for spark

Decorators

Inheritance in Python




Object oriented techniques.
- Iterators, Generators and Closures.
- Working with Threads.
- Exception handling.


__new__ vs __init__
dunder

https://docs.python.org/3/reference/datamodel.html

https://www.learnpython.org/en/Closures

decorators and functional programming

james powell python

david beazley generators

Below are all the questions that one should study for the pedantic interviews



Functional Programming




Link:

Notice this, you can create functions on runtime, more or less like lambdas in c++. So basically you are iterating over a list, making n take values 1,2 and 3

for n in [1, 2, 3]:
    def func(x):
        return n*x
so, by each iteration you are building a function named func, with takes a value and multiplies it for n. By appending it to the functions list you will have this functions stored, so you can iterate over the list to call the functions.

[function(2) for function in functions]



For decorators:
 higher order function in Python

Callback:

Pass-by reference vs Pass by value:


Q3. What is the difference between deep and shallow copy?
Ans: Shallow copy is used when a new instance type gets created and it keeps the values that are copied in the new instance. Shallow copy is used to copy the reference pointers just like it copies the values. These references point to the original objects and the changes made in any member of the class will also affect the original copy of it. Shallow copy allows faster execution of the program and it depends on the size of the data that is used.

Deep copy is used to store the values that are already copied. Deep copy doesn’t copy the reference pointers to the objects. It makes the reference to an object and the new object that is pointed by some other object gets stored. The changes made in the original copy won’t affect any other copy that uses the object. Deep copy makes execution of the program slower due to making certain copies for each object that is been called.



 How is Multithreading achieved in Python?
Ans: 

Python has a multi-threading package but if you want to multi-thread to speed your code up, then it’s usually not a good idea to use it.
Python has a construct called the Global Interpreter Lock (GIL). The GIL makes sure that only one of your ‘threads’ can execute at any one time. A thread acquires the GIL, does a little work, then passes the GIL onto the next thread.
This happens very quickly so to the human eye it may seem like your threads are executing in parallel, but they are really just taking turns using the same CPU core.
All this GIL passing adds overhead to execution. This means that if you want to make your code run faster then using the threading package often isn’t a good idea.



 How is memory managed in Python?
Ans: 

Memory management in python is managed by Python private heap space. All Python objects and data structures are located in a private heap. The programmer does not have access to this private heap. The python interpreter takes care of this instead.
The allocation of heap space for Python objects is done by Python’s memory manager. The core API gives access to some tools for the programmer to code.
Python also has an inbuilt garbage collector, which recycles all the unused memory and so that it can be made available to the heap space.


 Explain Inheritance in Python with an example.
Ans: Inheritance allows One class to gain all the members(say attributes and methods) of another class. Inheritance provides code reusability, makes it easier to create and maintain an application. The class from which we are inheriting is called super-class and the class that is inherited is called a derived / child class.

They are different types of inheritance supported by Python:

Single Inheritance – where a derived class acquires the members of a single super class.
Multi-level inheritance – a derived class d1 in inherited from base class base1, and d2 are inherited from base2.
Hierarchical inheritance – from one base class you can inherit any number of child classes
Multiple inheritance – a derived class is inherited from more than one base class.


List out the inheritance styles in Django.
Ans: In Django, there is three possible inheritance styles:

Abstract Base Classes: This style is used when you only wants parent’s class to hold information that you don’t want to type out for each child model.
Multi-table Inheritance: This style is used If you are sub-classing an existing model and need each model to have its own database table.
Proxy models: You can use this model, If you only want to modify the Python level behavior of the model, without changing the model’s fields.


Explain the use of decorators.
Ans: Decorators in Python are used to modify or inject code in functions or classes. Using decorators, you can wrap a class or function method call so that a piece of code can be executed before or after the execution of the original code. Decorators can be used to check for permissions, modify or track the arguments passed to a method, logging the calls to a specific method, etc.




Question 4
Python and multi-threading. Is it a good idea? List some ways to get some Python code to run in a parallel way.

Answer
Python doesn't allow multi-threading in the truest sense of the word. It has a multi-threading package but if you want to multi-thread to speed your code up, then it's usually not a good idea to use it. Python has a construct called the Global Interpreter Lock (GIL). The GIL makes sure that only one of your 'threads' can execute at any one time. A thread acquires the GIL, does a little work, then passes the GIL onto the next thread. This happens very quickly so to the human eye it may seem like your threads are executing in parallel, but they are really just taking turns using the same CPU core. All this GIL passing adds overhead to execution. This means that if you want to make your code run faster then using the threading package often isn't a good idea.

There are reasons to use Python's threading package. If you want to run some things simultaneously, and efficiency is not a concern, then it's totally fine and convenient. Or if you are running code that needs to wait for something (like some IO) then it could make a lot of sense. But the threading library won't let you use extra CPU cores.

Multi-threading can be outsourced to the operating system (by doing multi-processing), some external application that calls your Python code (eg, Spark or Hadoop), or some code that your Python code calls (eg: you could have your Python code call a C function that does the expensive multi-threaded stuff).


Question 10

Consider the following code, what will it output?
class A(object):
    def go(self):
        print("go A go!")
    def stop(self):
        print("stop A stop!")
    def pause(self):
        raise Exception("Not Implemented")

class B(A):
    def go(self):
        super(B, self).go()
        print("go B go!")

class C(A):
    def go(self):
        super(C, self).go()
        print("go C go!")
    def stop(self):
        super(C, self).stop()
        print("stop C stop!")

class D(B,C):
    def go(self):
        super(D, self).go()
        print("go D go!")
    def stop(self):
        super(D, self).stop()
        print("stop D stop!")
    def pause(self):
        print("wait D wait!")

class E(B,C): pass

a = A()
b = B()
c = C()
d = D()
e = E()

# specify output from here onwards

a.go()
b.go()
c.go()
d.go()
e.go()

a.stop()
b.stop()
c.stop()
d.stop()
e.stop()

a.pause()
b.pause()
c.pause()
d.pause()
e.pause()

Answer

The output is specified in the comments in the segment below:
a.go()
# go A go!

b.go()
# go A go!
# go B go!

c.go()
# go A go!
# go C go!
 
d.go()
# go A go!
# go C go!
# go B go!
# go D go!

e.go()
# go A go!
# go C go!
# go B go!

a.stop()
# stop A stop!

b.stop()
# stop A stop!

c.stop()
# stop A stop!
# stop C stop!

d.stop()
# stop A stop!
# stop C stop!
# stop D stop!

e.stop()
# stop A stop!
 
a.pause()
# ... Exception: Not Implemented

b.pause()
# ... Exception: Not Implemented

c.pause()
# ... Exception: Not Implemented

d.pause()
# wait D wait!

e.pause()
# ...Exception: Not Implemented

Why do we care?

Because OO programming is really, really important. Really. Answering this question shows your understanding of inheritance and the use of Python's super function. Most of the time the order of resolution doesn't matter. Sometimes it does, it depends on your application.

Question 11

Consider the following code, what will it output?

class Node(object):
    def __init__(self,sName):
        self._lChildren = []
        self.sName = sName
    def __repr__(self):
        return "<Node '{}'>".format(self.sName)
    def append(self,*args,**kwargs):
        self._lChildren.append(*args,**kwargs)
    def print_all_1(self):
        print(self)
        for oChild in self._lChildren:
            oChild.print_all_1()
    def print_all_2(self):
        def gen(o):
            lAll = [o,]
            while lAll:
                oNext = lAll.pop(0)
                lAll.extend(oNext._lChildren)
                yield oNext
        for oNode in gen(self):
            print(oNode)

oRoot = Node("root")
oChild1 = Node("child1")
oChild2 = Node("child2")
oChild3 = Node("child3")
oChild4 = Node("child4")
oChild5 = Node("child5")
oChild6 = Node("child6")
oChild7 = Node("child7")
oChild8 = Node("child8")
oChild9 = Node("child9")
oChild10 = Node("child10")

oRoot.append(oChild1)
oRoot.append(oChild2)
oRoot.append(oChild3)
oChild1.append(oChild4)
oChild1.append(oChild5)
oChild2.append(oChild6)
oChild4.append(oChild7)
oChild3.append(oChild8)
oChild3.append(oChild9)
oChild6.append(oChild10)

# specify output from here onwards

oRoot.print_all_1()
oRoot.print_all_2()

Answer

oRoot.print_all_1() prints:
<Node 'root'>
<Node 'child1'>
<Node 'child4'>
<Node 'child7'>
<Node 'child5'>
<Node 'child2'>
<Node 'child6'>
<Node 'child10'>
<Node 'child3'>
<Node 'child8'>
<Node 'child9'>

oRoot.print_all_2() prints:
<Node 'root'>
<Node 'child1'>
<Node 'child2'>
<Node 'child3'>
<Node 'child4'>
<Node 'child5'>
<Node 'child6'>
<Node 'child8'>
<Node 'child9'>
<Node 'child7'>
<Node 'child10'>

Why do we care?

Because composition and object construction is what objects are all about. Objects are composed of stuff and they need to be initialised somehow. This also ties up some stuff about recursion and use of generators.
Generators are great. You could have achieved similar functionality to print_all_2by just constructing a big long list and then printing it's contents. One of the nice things about generators is that they don't need to take up much space in memory.
It is also worth pointing out that print_all_1 traverses the tree in a depth-first manner, while print_all_2 is width-first. Make sure you understand those terms. Sometimes one kind of traversal is more appropriate than the other. But that depends very much on your application.

Tuesday, June 12, 2018

Big Data hadoop part 1



Bring something from hadoop fs and then do spark

https://spark.apache.org/examples.html

https://www.cloudera.com/more/training/courses/developer-training-for-spark-and-hadoop.html



Cloudera hadoop on digital ocean:
https://www.youtube.com/watch?v=Q2F_2tCFIMw


Spark SQL

Spark Streaming to process a live data stream

Python Map reduce program:



Hive create tables

Impala, Partitioning in Hive, re-doing partitioning local and external, file formats - tab

Scoop and Hive



Spark Streaming API can consume from sources like Kafka ,Flume, Twitter source to name a few. It can then apply transformations on the data to get the desired result which can be pushed further downstream.

Connecting Kafka with Streaming Spark API


 http://www.science.smith.edu/dftwiki/index.php/Hadoop_Tutorial_1_--_Running_WordCount

import java.io.IOException;
import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper  extends Mapper<LongWritable, Text, Text, IntWritable> {
  private static final int MISSING = 9999;    @Override  public void map(LongWritable key, Text value, Context context)      throws IOException, InterruptedException {        String line = value.toString();    String year = line.substring(15, 19);    int airTemperature;    if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs      airTemperature = Integer.parseInt(line.substring(88, 92));    } else {      airTemperature = Integer.parseInt(line.substring(87, 92));    }    String quality = line.substring(92, 93);    if (airTemperature != MISSING && quality.matches("[01459]")) {      context.write(new Text(year), new IntWritable(airTemperature));    }  } }