You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This code should reach 83-84% val accuracy on CIFAR10.
14
14
15
15
By default, the kernel ignores the initial state (fusing `b` and `c`), and only trains the `a` parameters (leaving `theta` fixed to the initialization).
16
-
You can play with those parameters in the training run:
16
+
You can play with these parameters in the training run:
17
17
* Adding `use_initial=true` will add a learnable initial state, and learn the `b` and `c` parameters separately.
18
18
* Setting `learn_theta=true` will make the `theta` parameters learnable (we usually see a decrease in performance of about 3 points from this).
19
19
* Setting `leran_a=false` will make the `a` parameters not learnable. We don't see much of a performance degradation on CIFAR in this case, which speaks to the utility of the Chebyshev initialization!
0 commit comments