Non-Linearity in Machine Learning (and how to make it quantum)

-

This week, I worked on implementing non-linearity in quantum neural networks. Simply put, non-linearity is one of the most important parts of machine learning. Classical models implement it using special operations called activation functions, which transform the activations (basically the weightings that the models use to make their predictions or generate their data) in ways that can’t be represented with linear functions (thus making them non-linear). But why do we want non-linearity in the first place?

 

Activation functions aside, almost every ML model layer comprises some adjustable matrix that assigns the weights to your data. Since these matrices are multiplied by existing activations, they are effectively matrix transformations and, by definition, linear transformations. This means that, basically, every ML layer is linear, and this has some important implications. Since multiplicative terms can be expressed as a single term (by just multiplying them), any ML model, regardless of how many layers it has, can be effectively represented with a single layer (so much for “deep” learning). This also means that the only thing that these types of layers can do is create linear combinations of activations (assign weights that are multiplied by each activation).

This could work in very simple cases, but what if one of the activations was quadratically associated with the task? Then, our model couldn’t represent that, and thus, non-linearity is important.

 

So, how do we go about implementing non-linearity in quantum neural networks? This is a question that I don’t fully understand the answer to, so I will just tell you what I have read in a paper that I read.

 

All quantum gates can be represented as unitary matrices, which are also, by definition, linear (and any quantum neural network is also composed of only quantum gates). The paper I read made this system non-linear by only considering measurements where a specific qubit (named the ancilla qubit) was 0. For example, given the 4 possible states resulting from a 2 qubit system, |00>, |01>, |10>, and |11>, the paper would only consider the |00> and |01> states. I don’t get why they are able to mess with the state probabilities like this (since these probabilities also encode information), so that will be the goal for this week. Thanks

More Posts

Leave a Reply

Your email address will not be published. Required fields are marked *