|
357 | 357 | "\n",
|
358 | 358 | "<center width=\"100%\"><img src=\"../../tutorial11/uniform_flow.png\" width=\"300px\"></center>\n",
|
359 | 359 | "\n",
|
360 |
| - "You can see that the height of $p(y)$ should be lower than $p(x)$ after scaling. This change in volume represents $\\left|\\frac{df(x)}{dx}\\right|$ in our equation above, and ensures that even after scaling, we still have a valid probability distribution. We can go on with making our function $f$ more complex. However, the more complex $f$ becomes, the harder it will be to find the inverse $f^{-1}$ of it, and to calculate the log-determinant of the Jacobian $\\log{} \\left|\\det \\frac{df(\\mathbf{x})}{d\\mathbf{x}}\\right|$. An easier trick to stack multiple invertible functions $f_{1,...,K}$ after each other, as all together, they still represent a single, invertible function. Using multiple, learnable invertible functions, a normalizing flow attempts to transform $p_z(z)$ slowly into a more complex distribution which should finally be $p_x(x)$. We visualize the idea below\n", |
| 360 | + "You can see that the height of $p(y)$ should be lower than $p(x)$ after scaling. This change in volume represents $\\left|\\frac{df(x)}{dx}\\right|$ in our equation above, and ensures that even after scaling, we still have a valid probability distribution. We can go on with making our function $f$ more complex. However, the more complex $f$ becomes, the harder it will be to find the inverse $f^{-1}$ of it, and to calculate the log-determinant of the Jacobian $\\log{} \\left|\\det \\frac{df(\\mathbf{x})}{d\\mathbf{x}}\\right|$ (often abbreviated as *LDJ*). An easier trick to stack multiple invertible functions $f_{1,...,K}$ after each other, as all together, they still represent a single, invertible function. Using multiple, learnable invertible functions, a normalizing flow attempts to transform $p_z(z)$ slowly into a more complex distribution which should finally be $p_x(x)$. We visualize the idea below\n", |
361 | 361 | "(figure credit - [Lilian Weng](https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html)):\n",
|
362 | 362 | "\n",
|
363 | 363 | "<center width=\"100%\"><img src=\"../../tutorial11/normalizing_flow_layout.png\" width=\"700px\"></center>\n",
|
|
414 | 414 | " return bpd, rng\n",
|
415 | 415 | "\n",
|
416 | 416 | " def encode(self, imgs, rng):\n",
|
417 |
| - " # Given a batch of images, return the latent representation z and ldj of the transformations\n", |
| 417 | + " # Given a batch of images, return the latent representation z and \n", |
| 418 | + " # log-determinant jacobian (ldj) of the transformations\n", |
418 | 419 | " z, ldj = imgs, jnp.zeros(imgs.shape[0])\n",
|
419 | 420 | " for flow in self.flows:\n",
|
420 | 421 | " z, ldj, rng = flow(z, ldj, rng, reverse=False)\n",
|
|
446 | 447 | " z = z_init\n",
|
447 | 448 | " \n",
|
448 | 449 | " # Transform z to x by inverting the flows\n",
|
| 450 | + " # The log-determinant jacobian (ldj) is usually not of interest during sampling\n", |
449 | 451 | " ldj = jnp.zeros(img_shape[0])\n",
|
450 | 452 | " for flow in reversed(self.flows):\n",
|
451 | 453 | " z, ldj, rng = flow(z, ldj, rng, reverse=True)\n",
|
|
6712 | 6714 | "\n",
|
6713 | 6715 | "$$z'_{j+1:d} = \\mu_{\\theta}(z_{1:j}) + \\sigma_{\\theta}(z_{1:j}) \\odot z_{j+1:d}$$\n",
|
6714 | 6716 | "\n",
|
6715 |
| - "The functions $\\mu$ and $\\sigma$ are implemented as a shared neural network, and the sum and multiplication are performed element-wise. The LDJ is thereby the sum of the logs of the scaling factors: $\\sum_i \\left[\\log \\sigma_{\\theta}(z_{1:j})\\right]_i$. Inverting the layer can as simply be done as subtracting the bias and dividing by the scale: \n", |
| 6717 | + "The functions $\\mu$ and $\\sigma$ are implemented as a shared neural network, and the sum and multiplication are performed element-wise. The log-determinant Jacobian (LDJ) is thereby the sum of the logs of the scaling factors: $\\sum_i \\left[\\log \\sigma_{\\theta}(z_{1:j})\\right]_i$. Inverting the layer can as simply be done as subtracting the bias and dividing by the scale: \n", |
6716 | 6718 | "\n",
|
6717 | 6719 | "$$z_{j+1:d} = \\left(z'_{j+1:d} - \\mu_{\\theta}(z_{1:j})\\right) / \\sigma_{\\theta}(z_{1:j})$$\n",
|
6718 | 6720 | "\n",
|
|
8786 | 8788 | " return self.nn(x)"
|
8787 | 8789 | ]
|
8788 | 8790 | },
|
8789 |
| - { |
8790 |
| - "cell_type": "code", |
8791 |
| - "execution_count": 16, |
8792 |
| - "metadata": {}, |
8793 |
| - "outputs": [ |
8794 |
| - { |
8795 |
| - "name": "stdout", |
8796 |
| - "output_type": "stream", |
8797 |
| - "text": [ |
8798 |
| - "Out (3, 32, 32, 18)\n" |
8799 |
| - ] |
8800 |
| - } |
8801 |
| - ], |
8802 |
| - "source": [ |
8803 |
| - "## Test MultiheadAttention implementation\n", |
8804 |
| - "# Example features as input\n", |
8805 |
| - "main_rng, x_rng = random.split(main_rng)\n", |
8806 |
| - "x = random.normal(x_rng, (3, 32, 32, 16))\n", |
8807 |
| - "# Create attention\n", |
8808 |
| - "mh_attn = GatedConvNet(c_hidden=32, c_out=18, num_layers=3)\n", |
8809 |
| - "# Initialize parameters of attention with random key and inputs\n", |
8810 |
| - "main_rng, init_rng = random.split(main_rng)\n", |
8811 |
| - "params = mh_attn.init(init_rng, x)['params']\n", |
8812 |
| - "# Apply attention with parameters on the inputs\n", |
8813 |
| - "out = mh_attn.apply({'params': params}, x)\n", |
8814 |
| - "print('Out', out.shape)\n", |
8815 |
| - "\n", |
8816 |
| - "del mh_attn, params" |
8817 |
| - ] |
8818 |
| - }, |
8819 | 8791 | {
|
8820 | 8792 | "cell_type": "markdown",
|
8821 | 8793 | "metadata": {},
|
|
0 commit comments