# Diving into Keras
One of the things that I find really helps me to understand an API or technology
is diving into its documentation. Keras is no different! It has a pretty-well
written documentation and I think we can all benefit from getting more acquainted
with it. That's what inspired this blog (and more to come) where we step through
the various, documented layers and other fun things that Keras has to offer us
and see if we can't learn a new thing or two about this awesome API!
Today, we will dive into the most basic layer, the Dense layer (something I have
no doubt that you all are at least a bit familiar with!). This post is also
available in a video form, which you can check out [here](https://www.youtube.com/watch?v=ohgONsuoxVs)!
# The Dense Layer
So, if you don't know where the documentation is for the Dense layer on Keras'
site, you can check it out here as a part of its [core layers](https://keras.io/layers/core/)
section. Here, we will find it as the first layer. Indeed, it is that important.
Right away, we can look at the default parameters of the layer, all of which we
will explore today. Here it is, if you don't want to click the link:
~~~python
keras.layers.Dense(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
~~~
But before we get into the parameters, let's just take a brief look at the basic
description Keras gives us of this layer and unpack that a bit.
`Just your regular densely-connected NN layer.`
That seems simple enough! Furthermore, it tells us that a dense layer is the
implementation of the equation `output = activation(dot(input, kernel) + bias)`.
This means that we are taking the dot product between our input tensor and
whatever the weight kernel matrix is featured in our dense layer. Then, we add
a bias vector (if we want to have a bias) and take an element-wise activation of
the output values (some sort of function, linear or, more often, non-linear!).
Another interesting note is that if you give a Dense layer an input with a rank
greater than 2, it will be flattened before taking the dot product. This is good
to know, and something I wasn't directly aware of before reading the documentation.
With all that introduction taken care of, let's start diving into the parameters.
## Units
The units are the most basic parameter to understand. This parameter is a positive
integer that denotes the output size of the layer. It's the most important
parameter we can set for this layer. The unit parameter actually dictates the
size of the weight matrix and bias vector (the bias vector will be the same size,
but the weight matrix will be calculated based on the size of the input data so
that the dot product will produce data that is of output size, units).
## Activation
This parameter sets the element-wise activation function to be used in the
dense layer. By default, we can see that it is set to None. That means that
by default it is a linear activation. This may work for your use-case! However,
linearity is limited, and thus Keras does give us a bunch of built-in
[activation functions](https://keras.io/activations/). This is where we might
choose an activation function to use for our layer.
## Use Bias
This parameter is very simple! It's just whether or not we wish to use a bias
vector in our calculation for our layer. There may be cases in which we do not.
By default, this is activated and Keras assumes that we will want to use a bias
vector and learn its values.
## Initializers
The initializer parameters tell Keras how to initialize the values of our layer.
For the Dense layer, we need to initialize our weight matrix and our bias vector
(if we are using it). Like with activations, there a bunch of different [initializers](https://keras.io/initializers/)
to explore!
Specifically, by default Keras uses the Zero initializer for the bias and the
Glorot Uniform initializer for the kernel weight matrix. As you might assume,
the Zero initializer simply will set our bias vector to all zeros.
The Glorot Uniform is the interesting one in this case. It pulls values from a
uniform distribution, however, its limits are dynamic with the size of the Dense
layer! It actually uses the following equation to calculate the limits of the
layer:
~~~python
limit = sqrt(6 / (fan_in + fan_out))
# Where the uniform distribution will fall uniformly between [-limit, limit]
~~~
Fan in is simply the units in the input tensor and fan out is the units in the
output tensor. Why this range? Well, it's written about in [this paper](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf).
We won't get into this paper in this post, but I encourage you to read it if
you're interested!
## Regularizers
The next three parameters are regularization (or penalty) parameters. By default,
these aren't used, but they can be useful in helping with the generalization of
your model in some situations so its important to know that they exist!
You can check it out in more detail [here](https://keras.io/regularizers/) if
you want to get your hands on some regularizers (there are L1, L2 and L1_L2 ready
for you to use out of the box as well as information on how to write your own
regularizer as well). The point is, we can apply a regularizer on three components
of our layer. We can apply it on the weight matrix, the bias vector, or the
entire thing (if we choose to apply it after the activation). These techniques
will have various effects such as keeping things sparse or keeping weights close
to zero. It's another hyperparameter to explore, and perhaps one that helps your
model get that last percentage of generalization before you deploy it for public
use!
## Constraints
Finally, the last parameter we will discuss are the two constraint parameters.
Simply put, these can constrain the values that our weight matrix or our bias
vector can take on. By default, these aren't activated, but you can view some of
the options available on the [constraint](https://keras.io/constraints/) page,
one of which is fairly easy to understand, the NonNeg constraint which forces
values of the weight/bias to be greater than or equal to 0.
These can be useful if you're trying to do any sort of weight clipping. For
example, the W-GAN uses weight clipping. Perhaps, if you were to re-write this
model yourself in Keras, you'd wish to use a Constraint to enforce this idea!
# Wrapping-Up
So there you have it, the Dense layer! I hope you found this post helpful and
learned something about the Dense layer that you didn't know before. Feel
free to let me know if you'd like to see more of these articles (or [videos](https://www.youtube.com/watch?v=ohgONsuoxVs))
and I'd love to have a conversation with you about it here or on [Twitter](https://twitter.com/hunter_heiden).
If you want to read more of what I've written, why not check out some of my other
posts like:
- [NEAT: An Awesome Approach to NeuroEvolution](/blog/neat-an-awesome-approach-to-neuroevolution/)
- [Stemming? Lemmatization? What?](/blog/stemming-lemmatization-what/)
- [CoQA: A Conversational Question Answering Challenge](/blog/coqa-conversation-question-answering/)
- [Introduction to Word Embeddings](/blog/intro-to-word-embeddings/)