The outline can be listed below:
1. The bounded linear operator, and the continuity definition of a linear operator, which is equivalent to: bounded or continuous at the origin.
2. The dual space, which is defined as a real bounded linear function.  The adjoint operator definition:
Let [latex]T:X \to Y[/latex] be a bounded linear operator between Banach spaces X and Y.  The adjoint operator [latex]T^{*} : X^{*} \to Y^{*}[/latex] can be constructed as: [latex]T^{*} (F)(x)=F(T(x))[/latex] for all [latex]x\in X, F\in Y^*[/latex]
Because every bounded linear operator over a Banach space X can be reconsidered as inner product:
[latex]F(x)=<x,y>[/latex]for all [latex]x\in X[/latex] and [latex]||F||=||y||[/latex] Here y is only linked with F.
So it is a comparatively simple step to define the inner product form of an adjoint operator [latex]y^*[/latex] instead of the function form[latex]T^*[/latex]:
[latex]F_T(x):=<T(x),y>=<x,y^*>[/latex] for all [latex]x\in H[/latex]
3. The differentiability definition in a Banach Space:

Gateaux differentiable at [latex]x_0[/latex]

[latex]DT(x_0;h):=\lim_{\epsilon \to 0} \frac{T(x_0+\epsilon h)-T(x_0)}{\epsilon}[/latex]

Frechet differentiable:

[latex]\lim_{||\delta x||\to 0}\frac{||T(x_0+\delta x)-T(x)-T’(x_0)\delta x||}{||\delta x||}=0[/latex]

They are equal when the two equations exist on all [latex]x,\delta x \in X[/latex]

The partial Frechet derivative [latex]D_x(x_0,y)[/latex]of an operator:

[latex]\lim_{||\delta x|| \to 0}\frac{||U(x_0+\delta x,y)-U(x_0,y)-D_x(x_0,y)(\delta x)||}{||\delta x||}=0[/latex]

Because the differentiability is defined by limit equations, the notion of the chain rule applies to operators that are continuously Frechet differentiable.

Then it comes the optimal question, I had the problem on figuring out the two functions given: [latex]J(\rho,\lambda),E(\rho,\lambda)[/latex].  [latex]J(\rho,\lambda)[/latex]is the cost function and [latex]E(\rho,\lambda)=0[/latex] is the solution of [latex]\frac{\partial \rho(x,t)}{\partial t}+\frac{\partial}{\partial x}(v(\rho)\rho(x,t))=0[/latex] with conditions.

Because [latex]\rho[/latex] is a function of [latex]\lambda[/latex], [latex]J(\rho,\lambda)[/latex] can be seen as [latex]j(\lambda)[/latex].  Then the book gives 3 conditions assumed, which assume everything is continuously Frechet differentiable.  I have some problems here.  I can’t understand how he find these 3 assumptions.  Hope it will be clear when I reading the calculation part.

I should look up the details on Lagrangian operator.  The only thing I know in calculating minimum is Lagrange multipliers.  Which is said not work during the semiconductor factory problem.  I will update the Algorithm For Solving when I am ready.