Historical Diary

728x90

Data Mining4

data mining, polynomial regression + step functions + Natural cubic spline + smoothing spline + local regression + GAM in R polynimial regression (다항식 회귀) - y_i=β_0+β_1 x_i+β_2 x_i^2+β_3 x_i^3+〖…+β〗_d x_i^d+ϵ_i [ϵ_i is the error term] - Generally, d is not greater than 3 or 4 (더 커지면 너무 극심하게 비선형 곡선이 됨) step functions (계단 함수) - X의 범위를 여러 개의 bin으로 분할하여 각 bin에 다른 상수를 적합 - Continuous variable을 ordered categorical variable로 변환 regression splines (회귀 스플라인) - Piecewise polynomials regression with a single knot (단일 매듭 조각별 다항식.. 2021. 12. 18.

data mining, forward + backward + ridge + lasso + pcr + pls linear model Y=β_0+β_1 X_1+…+β_p X_p+ϵ (Least squares methods) Forward Stepwise Selection - Best subset selection은 2^p개의 model을 고려해야하므로 p가 크면 사용하기 힘듦 - Null model에서 시작하여 한번에 한 개씩의 explanatory variable을 추가함 Backward Stepwise Selection - Full model에서 시작하여 한번에 한 개씩의 explanatory variable을 제외함 To choose a model with a low test error 1. estimate test error indirectly by making an adjustment to the tra.. 2021. 12. 17.

data mining, maximal margin classifier + support vector classifier + support vector machine in R - Maximal margin classifier (최대 마진 분류기): linear boundary로 class 구별 (에러 없음) - Support vector classifier (서포트 벡터 분류기): linear boundary & soft margin classifier (에러 포함) - Support vector machines (서포트 벡터 머신): non-linear class boundaries Maximal margin classifier (최대 마진 분류기) - Separating Hyperplane (분리 초평면) •Suppose a hyperplane that separates .. 2021. 12. 16.

data mining, random forest + boosting in R - Bagging에서와 같이 bootstrapped training sample에서 여러 개의 decision tree를 만듦 - Tree에서 분할이 고려될 때마다 p개의 predictors의 full set에서 m개의 predictors로 구성된 random sample만 선택하여 이들 중에서 한 개가 선택되도록 함 - 보통 m≈√p을 사용 - Random forest는 bagging방법에서 variance를 더 줄임으로써 test error를 줄임.. Why? • 하나의 very strong predictor와 여러 개의 moderately strong predictors가 있다고 가정하면 대부분의 tree에서는 top split에 very strong predicto.. 2021. 12. 15.

이전 1 다음

추천 글

728x90

티스토리툴바