test

NUMBERED ITEM 1

NUMBERED ITEM 2

Numbered List
8. Analysis and conclusions

demo()

for i = 1:10
    disp x
end

Comparing convergence traces produced by SFO and lbfgs, we can see that both optimizer converge to a similar optimal function value i.e. around 15. This can also be see from the reconstruction results that the reconstructed figures of Autoendoers trained by SFO and lbfgs respecitvely look very similar.
Another obvious charater is the convergence trace produced by SFO is less smooth than the trace produced by lbfgs. This is bacause SFO train objective function on subfunction i.e. minibatches rather than on whole dataset that lbfgs does. Although the convergence trance is lesss smooth, SFO needs much less running time memory than lbfgs. And this feature will be more important in task with very large training dataset.

Numbered List

NUMBERED ITEM 1
NUMBERED ITEM 2

Case 1: successful update ${\eta}^{t+1} = \frac{1}{N}+\frac{N-1}{N}{\eta}^{t}$

Case 2: failed update ${\eta}^{t+1} = \frac{1}{2}{\eta}^{t+1}$

cartoon_pane_A = imread('figure_cartoon_pane_A.jpg');
cartoon_pane_B = imread('figure_cartoon_pane_B.jpg');
cartoon_pane_C = imread('figure_cartoon_pane_C.jpg');
figure,imagesc(cartoon_pane_A)
figure,imagesc(cartoon_pane_B)
figure,imagesc(cartoon_pane_C)

Performe finally the Newton step according to Equation (5). (The Newton step length is adjusted in order to avoid numerical errors caused by small numbers or bouncing around opitimized values caused by large step size.)

$x^{t}=x^{t-1}-\eta^{t}\left(H^{t-1}\right)\frac{\partial G^{t-1}\left(x^{t-1}\right)}{\partial x}$ Equation(5) # Active subfunction set is expended in the end.

8. Analysis and conclusions

Experiment results analysis:

Comparing convergence traces produced by SFO and lbfgs, we can see that both optimizer converge to a similar optimal function value i.e. around 15. This can also be see from the reconstruction results that the reconstructed figures of Autoendoers trained by SFO and lbfgs respecitvely look very similar.
Another obvious charater is the convergence trace produced by SFO is less smooth than the trace produced by lbfgs. This is bacause SFO train objective function on subfunction i.e. minibatches rather than on whole dataset that lbfgs does. Although the convergence trance is lesss smooth, SFO needs much less running time memory than lbfgs. And this feature will be more important in task with very large training dataset.
The way that SFO works on subfunctions of objective function also brings it a disadvantage in task with small dataset.

Overall conclusion:

# # #

A vector $x^t$ is chosen by minimizing the approximated objective function $G^{t-1}(x)$ at the previous iteration by a Newton step.

Contents

Numbered List

8. Analysis and conclusions