วันพฤหัสบดีที่ 26 มกราคม พ.ศ. 2560

Supervised Machine Learning

1) Supervised Machine Learning

find mapping function P where min (P(x)-y)^2
P(x) : predicted result
y : real measure

Example P
y = m*x + b
We need to find the unknown parameter m,b that min(P(x)-y)^2

Example model P
* Factorialized Machine ****
* Gradient Tree boosting
* Generalized Linear Model
* SVM
* Neural Network

2) Model Evaluation, and how to improve

* High Bias ( underfit ) : too many assumption ( naive model ), high error on test data
training accuracy : high
testing accuracy : low
The gap between the training accuracy and testing accuracy is so high.
how to improve
> add unknown variable
> change model
We could not add training data, if it is high bias case.

* Just Right
training accuracy : low
testing accuracy : low

* High Varience ( overfit ) : too less assumption ( complicate model ), too many unknown parameters
training accuracy : high
testing accuracy : low
The gap between the training accuracy (i.e. 0.8) and testing accuracy (i.e. 0.6) is not so high.
how to improve
> reduce unknown variable
> change model
> add training data


3) How to optimized Hyper-parameter model
* model parameter : optimized parameter
 i.e. vector W in Neural Network
* hyper parameter : high level of model
 i.e. number/size of hidden layer in Neural Network

Method to optimize hyper-parameter model
* Grid-Search with cross-validation
* Bayesian optimization

4) How to test the Model from the data in the past
From data we have in the past, split it to 2 group.
Group 1 : training data
Group 2 : testing data
NOTE : In search prediction case, the data group can be splitted by date ( this will include the #seasonal ). We could not splitted data by user in that case.

However, if in the search prediction case, this need to be careful, since the search result (X) of the data in the past is the result of the previous Model (P).

5) After that we do an Online experiment
* A/B Testing
A Algorithm, measure commercial rate
B Algorithm, measure commercial rate
* Multivariate ( the is used by facebook, that is more complex than the A/B Testing )
* Feedback Loop
measure the commercial rate to be used as an input to improve model

6) "Learning to Rank" Algorithm
let Q : user query condition i.e. query word, query filter
     X : the search result
     result : +( user click the link, or can be commercialized, or can make profit ) or - ( user did not click the link )
classification ( Factorialized Machine ? )
(Q1,X1) -> +  this is unknown variable#1
(Q1,X2) -> -   this is unknown variable#2
...

7) "Precision of K" : care only top K most data
(Q1,X1) -> +        (1)
(Q1,X2) -> -         (2)
(Q1,X3) -> +        (3)
(Q2,X1) -> +
(Q2,X2) -> -

Precision of 3  = 2/3                              
Precision of 4  = 3/4
Precision of 5 = 3/5

8) User-Hotel correlation, that effects on conversion rate
* user history booking <-> hotel price
* length of stay <-> voting star
* hotel id <-> user id

length of stay + voting of stars will have an effects on conversion rate
i.e. user length of stay 1 will have a high conversion on the one star hotel
                                                          low conversion on the five star hotel
      user length of stay 3 will have a high conversion on the three star hotel
                                      and low conversion on the one star hotel and five star hotel
      user length of stay 5 will have a high conversion on the five star hotel
                                                          low conversion on the one star hotel




ไม่มีความคิดเห็น:

LinkWithin

Related Posts Plugin for WordPress, Blogger...