Support Vector Machine

Boundary function

\begin{align} f(x) = \boldsymbol{w^T x} + \boldsymbol{b} = 0 \end{align}

Each \(x_i\) is in eather \(C_1 \) or \( C_2 \)

\begin{align} x_i \in C_1 \Leftrightarrow \boldsymbol{w^T x} + \mathbf{b} < 0 \end{align} \begin{align} x_i \in C_2 \Leftrightarrow \boldsymbol{w^T x} + \mathbf{b} > 0 \end{align} using label \(t_i\) and discribe above two equations \begin{eqnarray} t_{i}= \left\{ \begin{array}{ll} -1 & ( x_i \in C_1 ) \\ 1 & (x_i \in C_2 ) , \ \ \ (i = 1,2,...,n) \\ \end{array} \right. \end{eqnarray} \[ t_i( \boldsymbol{w^Tx_i} + \boldsymbol{b} ) > 0, \ \ \ (i=1,2,...,n) \]

The distance \( d \) between datapoint \(\boldsymbol{x_i} = (x_1,x_2,...,x_p) \) and \( f(x) \). This dimension of \(x_i\) is \(p\)

\[ d = \frac{ \| \boldsymbol{w^Tx} + \boldsymbol{b} \|}{ \| \boldsymbol{w} \|} \]

maximize margin \(M\)

\[ \arg \max_{\boldsymbol{w,b}} M, \ \ \ \frac{ t_i( \boldsymbol{w^Tx_i} + \boldsymbol{b} )}{ \| \boldsymbol{w} \|} \geq M, \ \ \ (i=1,2,...,n) \] \[ t_i( \boldsymbol{\hat{w}^T x_i} + \boldsymbol{\hat{b}} ) \geq 1, \ \ \ (\hat{\boldsymbol{w}} = \frac{ \boldsymbol{w} }{M \| \boldsymbol{w} \|}, \ \ \hat{\boldsymbol{b}} = \frac{\boldsymbol{b}}{M \| \boldsymbol{w} \|}) \]

minimize denominator \( \|w \|^2 \)

\[ \arg \min_{\boldsymbol{w,b}} \frac{1}{2} \| w \|^2 , \ \ \ t_i( \boldsymbol{w^Tx_i} + \boldsymbol{b} ) \geq 1, \ \ \ (i=1,2,...,n) \]

soft margin using slack variable \( \xi_i \)

\[ t_i ( \boldsymbol{w_i^Tx_i} + \boldsymbol{b} ) \geq 1 - \xi_i \]

minimize this equation below

\[ \min_{w, \ \xi} \{ \ \frac{1}{2} \|w\|^2 + C \sum^{n}{\xi_i} \ \} \]

if minimize the first term margin \( \frac{1}{2} \|w\|^2 \) , the second term \( C \sum^{n}{\xi_i} \) is increasing

the term of miximize margin and the term of minimize smalleness of the data within margin are contraversive, and this is an optimisation problem under this contraversive condition

dual representation (not understand completely ... )

It's possible to decrease valuables

Kernel function can be adopted and, non-liner separation is able to be employed

Lagrange's method of undetermined multiplier

optimize problem with restriction

\[ L (x, \alpha) = f(x) + \sum^{n} \alpha_i g_i(x) \] \begin{align} & 1. \ \ \ \frac{\partial L(x,\alpha)}{ \partial x} = \frac{\partial f(x)}{\partial x} + \sum^{n} \alpha_i \frac{\partial g_i(x)}{\partial x} = 0 \\ & 2. \ \ \ \frac{\partial L(x,\alpha)}{ \partial \alpha_i} = g_i(x) \leq 0 \ \ \ i=1,2,...,n \\ & 3. \ \ \ \alpha_i , \ \ \ i=1,2,...,n \\ & 4. \ \ \ \alpha_ig_i(x) = 0 \end{align}

apply this to SVM

\[ L( w,b, \xi , x, \alpha, \beta) = \frac{1}{2} \|w\|^2 + C \sum^{n}{\xi_i} - \sum^{n} \alpha_i { \{ t_i ( \boldsymbol{w_i^Tx_i} + \boldsymbol{b} )-1+\xi_i \} } - \sum^n{\beta_i \xi_i} \] \begin{align} & \ \ \ \frac{\partial L}{ \partial x} = \boldsymbol{w} - \sum{\alpha_i y_i x_i} = 0 & \Leftrightarrow w = \sum{\alpha_i y_i x_i} \\ & \ \ \ \frac{\partial L}{ \partial \beta} = - \sum{\alpha_i y_i} = 0 & \Leftrightarrow \sum{\alpha_i y_i} = 0 \\ & \ \ \ \frac{\partial L}{ \partial w} = C - \alpha_i - \beta_i & \Leftrightarrow C = \alpha_i + \beta_i \end{align}

miximiza this function with valiable \( \alpha \) below

\[ L(\alpha) = \]