선형 회귀 분석 (단순)¶
많이 사용하는 두가지 패키지가 있음¶
In [2]:
# statsmodels ols
from statsmodels.formula.api import ols
In [12]:
# sklearn LinearRegression
from sklearn.linear_model import LinearRegression
In [20]:
# 오차 계산
from sklearn.metrics import mean_squared_error # MSE -> 제곱근 RMSE
from sklearn.metrics import mean_absolute_error # MAE
In [21]:
# 테스트 셋을 나누기
from sklearn.model_selection import train_test_split
In [22]:
import pandas as pd
dia = pd.read_csv("diamonds.csv")
dia_train, dia_test = train_test_split(dia ,train_size=0.7,random_state=123)
In [35]:
dia.head()
Out[35]:
carat | cut | color | clarity | depth | table | price | x | y | z | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.23 | Ideal | E | SI2 | 61.5 | 55.0 | 326 | 3.95 | 3.98 | 2.43 |
1 | 0.21 | Premium | E | SI1 | 59.8 | 61.0 | 326 | 3.89 | 3.84 | 2.31 |
2 | 0.23 | Good | E | VS1 | 56.9 | 65.0 | 327 | 4.05 | 4.07 | 2.31 |
3 | 0.29 | Premium | I | VS2 | 62.4 | 58.0 | 334 | 4.20 | 4.23 | 2.63 |
4 | 0.31 | Good | J | SI2 | 63.3 | 58.0 | 335 | 4.34 | 4.35 | 2.75 |
ols¶
In [9]:
from statsmodels.formula.api import ols
In [14]:
# X가 1개인 경우 X:carat y:price
In [16]:
formula="price~carat"
In [23]:
model = ols(formula=formula,data=dia_train).fit()
model
Out[23]:
<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x25888935d90>
In [24]:
model.summary()
Out[24]:
Dep. Variable: | price | R-squared: | 0.849 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.849 |
Method: | Least Squares | F-statistic: | 2.125e+05 |
Date: | Sun, 27 Nov 2022 | Prob (F-statistic): | 0.00 |
Time: | 19:14:35 | Log-Likelihood: | -3.3090e+05 |
No. Observations: | 37758 | AIC: | 6.618e+05 |
Df Residuals: | 37756 | BIC: | 6.618e+05 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -2261.4691 | 15.636 | -144.630 | 0.000 | -2292.117 | -2230.822 |
carat | 7760.7979 | 16.835 | 460.985 | 0.000 | 7727.800 | 7793.795 |
Omnibus: | 9750.769 | Durbin-Watson: | 1.990 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 108813.279 |
Skew: | 0.923 | Prob(JB): | 0.00 |
Kurtosis: | 11.109 | Cond. No. | 3.66 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [29]:
dia_test["predict"] = model.predict(dia_test)
In [30]:
mean_squared_error(y_true=dia_test["price"],y_pred=dia_test["predict"])**0.5
Out[30]:
1549.1452107597029
sklearn LinerRegression()¶
In [ ]:
from sklearn.linear_model import LinearRegression
In [31]:
# X가 1개인 경우 X:carat y:price
In [32]:
model = LinearRegression().fit(X=dia_train[["carat"]],y=dia_train[["price"]])
model
Out[32]:
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
In [33]:
model.coef_
Out[33]:
array([[7760.79786937]])
In [34]:
model.intercept_
Out[34]:
array([-2261.46912291])
In [36]:
dia_test["predict"] = model.predict(dia_test[["carat"]])
In [37]:
mean_squared_error(y_true=dia_test["price"],y_pred=dia_test["predict"])**0.5
Out[37]:
1549.1452107597036
댓글 없음:
댓글 쓰기