Skip to content

Commit fa80c4d

Browse files
committed
Tutorial 11 (JAX): Explaining abbrev LDJ
1 parent 5f7828e commit fa80c4d

File tree

1 file changed

+5
-33
lines changed

1 file changed

+5
-33
lines changed

docs/tutorial_notebooks/JAX/tutorial11/NF_image_modeling.ipynb

+5-33
Original file line numberDiff line numberDiff line change
@@ -357,7 +357,7 @@
357357
"\n",
358358
"<center width=\"100%\"><img src=\"../../tutorial11/uniform_flow.png\" width=\"300px\"></center>\n",
359359
"\n",
360-
"You can see that the height of $p(y)$ should be lower than $p(x)$ after scaling. This change in volume represents $\\left|\\frac{df(x)}{dx}\\right|$ in our equation above, and ensures that even after scaling, we still have a valid probability distribution. We can go on with making our function $f$ more complex. However, the more complex $f$ becomes, the harder it will be to find the inverse $f^{-1}$ of it, and to calculate the log-determinant of the Jacobian $\\log{} \\left|\\det \\frac{df(\\mathbf{x})}{d\\mathbf{x}}\\right|$. An easier trick to stack multiple invertible functions $f_{1,...,K}$ after each other, as all together, they still represent a single, invertible function. Using multiple, learnable invertible functions, a normalizing flow attempts to transform $p_z(z)$ slowly into a more complex distribution which should finally be $p_x(x)$. We visualize the idea below\n",
360+
"You can see that the height of $p(y)$ should be lower than $p(x)$ after scaling. This change in volume represents $\\left|\\frac{df(x)}{dx}\\right|$ in our equation above, and ensures that even after scaling, we still have a valid probability distribution. We can go on with making our function $f$ more complex. However, the more complex $f$ becomes, the harder it will be to find the inverse $f^{-1}$ of it, and to calculate the log-determinant of the Jacobian $\\log{} \\left|\\det \\frac{df(\\mathbf{x})}{d\\mathbf{x}}\\right|$ (often abbreviated as *LDJ*). An easier trick to stack multiple invertible functions $f_{1,...,K}$ after each other, as all together, they still represent a single, invertible function. Using multiple, learnable invertible functions, a normalizing flow attempts to transform $p_z(z)$ slowly into a more complex distribution which should finally be $p_x(x)$. We visualize the idea below\n",
361361
"(figure credit - [Lilian Weng](https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html)):\n",
362362
"\n",
363363
"<center width=\"100%\"><img src=\"../../tutorial11/normalizing_flow_layout.png\" width=\"700px\"></center>\n",
@@ -414,7 +414,8 @@
414414
" return bpd, rng\n",
415415
"\n",
416416
" def encode(self, imgs, rng):\n",
417-
" # Given a batch of images, return the latent representation z and ldj of the transformations\n",
417+
" # Given a batch of images, return the latent representation z and \n",
418+
" # log-determinant jacobian (ldj) of the transformations\n",
418419
" z, ldj = imgs, jnp.zeros(imgs.shape[0])\n",
419420
" for flow in self.flows:\n",
420421
" z, ldj, rng = flow(z, ldj, rng, reverse=False)\n",
@@ -446,6 +447,7 @@
446447
" z = z_init\n",
447448
" \n",
448449
" # Transform z to x by inverting the flows\n",
450+
" # The log-determinant jacobian (ldj) is usually not of interest during sampling\n",
449451
" ldj = jnp.zeros(img_shape[0])\n",
450452
" for flow in reversed(self.flows):\n",
451453
" z, ldj, rng = flow(z, ldj, rng, reverse=True)\n",
@@ -6712,7 +6714,7 @@
67126714
"\n",
67136715
"$$z'_{j+1:d} = \\mu_{\\theta}(z_{1:j}) + \\sigma_{\\theta}(z_{1:j}) \\odot z_{j+1:d}$$\n",
67146716
"\n",
6715-
"The functions $\\mu$ and $\\sigma$ are implemented as a shared neural network, and the sum and multiplication are performed element-wise. The LDJ is thereby the sum of the logs of the scaling factors: $\\sum_i \\left[\\log \\sigma_{\\theta}(z_{1:j})\\right]_i$. Inverting the layer can as simply be done as subtracting the bias and dividing by the scale: \n",
6717+
"The functions $\\mu$ and $\\sigma$ are implemented as a shared neural network, and the sum and multiplication are performed element-wise. The log-determinant Jacobian (LDJ) is thereby the sum of the logs of the scaling factors: $\\sum_i \\left[\\log \\sigma_{\\theta}(z_{1:j})\\right]_i$. Inverting the layer can as simply be done as subtracting the bias and dividing by the scale: \n",
67166718
"\n",
67176719
"$$z_{j+1:d} = \\left(z'_{j+1:d} - \\mu_{\\theta}(z_{1:j})\\right) / \\sigma_{\\theta}(z_{1:j})$$\n",
67186720
"\n",
@@ -8786,36 +8788,6 @@
87868788
" return self.nn(x)"
87878789
]
87888790
},
8789-
{
8790-
"cell_type": "code",
8791-
"execution_count": 16,
8792-
"metadata": {},
8793-
"outputs": [
8794-
{
8795-
"name": "stdout",
8796-
"output_type": "stream",
8797-
"text": [
8798-
"Out (3, 32, 32, 18)\n"
8799-
]
8800-
}
8801-
],
8802-
"source": [
8803-
"## Test MultiheadAttention implementation\n",
8804-
"# Example features as input\n",
8805-
"main_rng, x_rng = random.split(main_rng)\n",
8806-
"x = random.normal(x_rng, (3, 32, 32, 16))\n",
8807-
"# Create attention\n",
8808-
"mh_attn = GatedConvNet(c_hidden=32, c_out=18, num_layers=3)\n",
8809-
"# Initialize parameters of attention with random key and inputs\n",
8810-
"main_rng, init_rng = random.split(main_rng)\n",
8811-
"params = mh_attn.init(init_rng, x)['params']\n",
8812-
"# Apply attention with parameters on the inputs\n",
8813-
"out = mh_attn.apply({'params': params}, x)\n",
8814-
"print('Out', out.shape)\n",
8815-
"\n",
8816-
"del mh_attn, params"
8817-
]
8818-
},
88198791
{
88208792
"cell_type": "markdown",
88218793
"metadata": {},

0 commit comments

Comments
 (0)