feed-forward-neural-network-theano slides

Feed Forward Neural Networks

with Theano

Note: The code is just for demonstration purpose. Some python function are just for producing plots for slides (ipython reveal). So don't blame me for code quality.

In [2]:
from IPython.display import Image
Image(filename='./pics/Deeplearning 2.png') 
Out[2]:

For each layer:

Input: $\vec x^T = (x_1, x_2, \dots x_n)$

Output of the j-th neuron: $$ h_j = \sigma(\sum_{i=1}^n w_{ij} x_i + b_j) $$

In matrix form ($W$ is a matrix):

$$ \vec h = \sigma(\vec x \cdot W + \vec b) $$
  • element-wise application of the activation function $\sigma$.

Train data - not linear separable

In [8]:
plot_train_data(X_train[t_train==0], X_train[t_train==1])

Feature Transformation

$$ \phi_1(\vec x) = x_1^2 $$$$ \phi_2(\vec x) = x_2^2 $$

In the new feature space the data is linear-separable:

In [10]:
phi_train = X_train**2
plot_train_transformed(phi_train[t_train==0], phi_train[t_train==1])

Learning of the feature transformation by neural networks

  • The activity vector of the "Hidden-Layers" can be interpreted as a transformation of the input vector.
In [14]:
def logistic_function(x):
    return 1./(1. + T.exp(-x))
In [15]:
def relu(x):
    return T.switch(x<0, 0, x)

Feed Forward Neural Network with a hidden layer

Aktivity vector of the first hidden layer:

$$ \vec h^{(1)} = \sigma_1 \left(\vec x \cdot W^{(1)} + \vec b^{(1)} \right) $$

Activity of the output $\vec o$ (with only one output $o$ is a scalar):

$$ \vec o = \vec h^{(2)}= \sigma_2 \left( \vec h^{(1)} \cdot W^{(2)} + \vec b^{(2)} \right) $$
In [16]:
# (first) hidden layer
a = T.dot(X, W_h) + b_h

# activity function "rectified linear units"
h = relu(a)

# output neuron:
y = logistic_function(T.dot(h, W_o) + b_o)
In [17]:
fn_predict = theano.function(inputs = [X], outputs = y)
In [18]:
#fn_predict(X_train)

cost function and l2-regularization

In [49]:
#TODO sum is only used for casting a vector with one element to a scalar!
cross_entropy = T.sum(-(T.dot(target, T.log(y)) + T.dot((1.-target), T.log(1.-y))))

l2_reg = T.mean(T.sqr(W_h)) + T.mean(T.sqr(W_o))

lambda_ = 0.02
cost = cross_entropy + lambda_ * l2_reg
In [50]:
#theano.printing.pprint(cost) 
In [51]:
#theano.printing.debugprint(cost)
In [52]:
cost_func = theano.function(inputs=[X, target], outputs=[cost])
In [53]:
#theano.printing.debugprint(cost_func)
In [54]:
cost_func(X_train, t_train)
Out[54]:
[array(44.90793435302534)]
In [55]:
def get_train_functions(cost, v, target, learning_rate=0.01):
    gparams = []
    for param in params:
        gparam = T.grad(cost, param)
        gparams.append(gparam)

    updates=[]
    for param, gparam in zip(params, gparams):
        updates.append((param, param - gparam * learning_rate))
        
    learn_fn = theano.function(inputs = [v, target],
                                   outputs = cost,
                                   updates = updates)
    return learn_fn


learn_fn = get_train_functions(cost, X, target)
In [56]:
nb_epochs=5000
train_errors = np.ndarray(nb_epochs)


for x in range(nb_epochs):
    train_errors[x] = learn_fn(X_train, t_train)  
    
train_errors
Out[56]:
array([ 44.90793435,  44.81909323,  44.74636265, ...,   2.94757535,
         2.9474241 ,   2.94727294])
In [57]:
plt.plot(range(nb_epochs), train_errors, '-b')
plt.xlabel('Iterations')
plt.ylabel('Cost')
Out[57]:
<matplotlib.text.Text at 0x10bcf7c90>

Decision Boundary

In [62]:
plot_contour(X_train[t_train==0], X_train[t_train==1], 'train data')
In [63]:
plot_contour(X_test[t_test==0], X_test[t_test==1], 'test data')

Curse of dimensionality

In [64]:
pics_path="../../pythonFuerDieDatenanalyse/tensorflow/notebook/pics/"
In [65]:
from IPython.display import Image
Image(filename=pics_path+'Deeplearning 7.png') 
Out[65]:
  • "Real world data" für "KI Tasks" ist hochdimensional.
  • Volumen wächst exponentiell $r^n$ mit Anzahl der Dimensionen $n$
  • Warum sind hier nicht (exponentiell mit n) viele Trainingsdaten notwendig?

Manifold Hypothesis

  • Trainingsdaten nur noch in wenig Bereichen des Raums in nicht-linearen "Manifolds".
  • Grund: starke statistische Abhängigkeit der Variablen.
  • Manifolds weisen starke Krümmungen im eingebettenten Raum auf
    • z.B. Translation des gleichen Bildes um ein Pixel verschiebt das Bild in einen völlig anderen Bereich des einbettenden Datenraums
In [66]:
from IPython.display import Image
Image(filename=pics_path+'./manifold-bengio.png')