SW정리: 주가 예측 (5) - xgboost reg:squarederror, python, eta

앞에서 기본적인 예측 모델을 만들어 봤습니다. 그런데 parameter가 너무 많아서 정확히 예측하기란 상당히 어렵습니다.

이번에는 eta를 변경해가면서 어떤 경우가 가장 정확한 값을 예측하는지 알아 보도록 하겠습니다.

전체 소스를 보기전에 한턴에 보여지는 로그를 출력해봤습니다.

2020-10-03 11:53:57 DEBUG    start feature
2020-10-03 11:55:53 DEBUG    start end
2020-10-03 11:55:55 DEBUG    {'eta': 0.01, 'objective': 'reg:squarederror'}
2020-10-03 12:56:27 DEBUG    sumlist:[(0, 2.5, 7, 6, 1392.1660581831943, 0.027929875914426777), (1, 2.5, 1, 0, 7202.076459751516, 0.03675441499716935), (2, -2.5, 0, 1, 19475.120979134153, 0.0319474684835044), (3, -2.5, 0, 1, 949.5799261966629, 0.03389920755691996), (4, 0.0, 2, 2, 4907.361723146793, 0.040279250493609996), (5, -5.0, 0, 2, 7697.559682046201, 0.03257472269442078), (6, -10.0, 0, 4, 12122.552975588333, 0.031095138146977737), (7, -5.0, 2, 4, 4304.819984087155, 0.038819924863618835), (8, -17.5, 2, 9, 2410.0308303777374, 0.05141925381916544), (9, -2.5, 4, 5, 3982.2608785847924, 0.04801275052100042), (10, 0.0, 4, 4, 4813.783438728317, 0.04216307192831111), (11, -2.5, 0, 1, 9872.945332267307, 0.03529055346471463), (12, -5.0, 1, 3, 8863.012628966677, 0.04373859247710521), (13, 0.0, 2, 2, 5223.68460261262, 0.032242335577264), (14, 0.0, 1, 1, 3831.5022841706063, 0.04171700001130447), (15, 2.5, 8, 7, 2875.000850439347, 0.0364236622756733), (16, -5.0, 1, 3, 5578.390946286273, 0.04532776734872578), (17, 5.0, 2, 0, 518.6210621531944, 0.0424717190966389), (18, 2.5, 3, 2, 168.91330619903476, 0.023616024750522477), (19, -7.5, 1, 4, 3375.829687899367, 0.03990691286024174)]
2020-10-03 12:56:27 DEBUG    sumtotal:-50.0
2020-10-03 12:56:27 DEBUG    rmsetotal:109565.21363681929
2020-10-03 12:56:27 DEBUG    rmsletotal:0.7556296472813153
2020-10-03 12:56:27 DEBUG    rpath:[]

eta는 learning rate로 숫자가 커지면 학습이 빨라지고 작아지면 학습이 느려집니다. 그렇다면 단순하게 생각할때 숫자가 작으면 학습 정확도가 올라간다고 생각하기 쉽습니다. 실제로 그런지 테스트 해봤습니다.

eta 를 0.1 부터 0.2 까지 0.01씩 올라가면서 변화를 시켰고 결과를 알 수 있도록 출력하였습니다.

	Line 3: 2020-10-03 11:55:55 DEBUG    {'eta': 0.01, 'objective': 'reg:squarederror'}
	Line 11: 2020-10-03 12:58:21 DEBUG    {'eta': 0.02, 'objective': 'reg:squarederror'}
	Line 19: 2020-10-03 14:01:46 DEBUG    {'eta': 0.03, 'objective': 'reg:squarederror'}
	Line 27: 2020-10-03 15:04:37 DEBUG    {'eta': 0.04, 'objective': 'reg:squarederror'}
	Line 35: 2020-10-03 16:07:48 DEBUG    {'eta': 0.05, 'objective': 'reg:squarederror'}
	Line 43: 2020-10-03 17:11:35 DEBUG    {'eta': 0.06, 'objective': 'reg:squarederror'}
	Line 51: 2020-10-03 18:15:19 DEBUG    {'eta': 0.07, 'objective': 'reg:squarederror'}
	Line 59: 2020-10-03 19:18:52 DEBUG    {'eta': 0.08, 'objective': 'reg:squarederror'}
	Line 67: 2020-10-03 20:18:47 DEBUG    {'eta': 0.09, 'objective': 'reg:squarederror'}
	Line 75: 2020-10-03 21:12:00 DEBUG    {'eta': 0.1, 'objective': 'reg:squarederror'}
	Line 83: 2020-10-03 22:01:27 DEBUG    {'eta': 0.11, 'objective': 'reg:squarederror'}
	Line 91: 2020-10-03 22:46:01 DEBUG    {'eta': 0.12, 'objective': 'reg:squarederror'}
	Line 99: 2020-10-03 23:27:09 DEBUG    {'eta': 0.13, 'objective': 'reg:squarederror'}
	Line 107: 2020-10-04 00:04:50 DEBUG    {'eta': 0.14, 'objective': 'reg:squarederror'}
	Line 115: 2020-10-04 00:40:15 DEBUG    {'eta': 0.15, 'objective': 'reg:squarederror'}
	Line 123: 2020-10-04 01:14:01 DEBUG    {'eta': 0.16, 'objective': 'reg:squarederror'}
	Line 131: 2020-10-04 01:46:09 DEBUG    {'eta': 0.17, 'objective': 'reg:squarederror'}
	Line 139: 2020-10-04 02:15:53 DEBUG    {'eta': 0.18, 'objective': 'reg:squarederror'}
	Line 147: 2020-10-04 02:45:16 DEBUG    {'eta': 0.19, 'objective': 'reg:squarederror'}
	Line 155: 2020-10-04 03:12:49 DEBUG    {'eta': 0.2, 'objective': 'reg:squarederror'}

rmse 합계를 출력해봤습니다. 첫번째 0.01 일때가 가장 작은 수치입니다. 이말의 의미는 eta가 0.01일때 error가 가장 작다는 의미입니다.

	Line 6: 2020-10-03 12:56:27 DEBUG    rmsetotal:109565.21363681929
	Line 14: 2020-10-03 13:59:52 DEBUG    rmsetotal:111194.95603916253
	Line 22: 2020-10-03 15:02:41 DEBUG    rmsetotal:111777.55994124137
	Line 30: 2020-10-03 16:05:52 DEBUG    rmsetotal:112556.70621922112
	Line 38: 2020-10-03 17:09:38 DEBUG    rmsetotal:112468.58425228977
	Line 46: 2020-10-03 18:13:24 DEBUG    rmsetotal:112747.17691691613
	Line 54: 2020-10-03 19:16:56 DEBUG    rmsetotal:112791.67206001066
	Line 62: 2020-10-03 20:16:51 DEBUG    rmsetotal:113258.163385734
	Line 70: 2020-10-03 21:10:05 DEBUG    rmsetotal:113873.4285542438
	Line 78: 2020-10-03 21:59:29 DEBUG    rmsetotal:113960.54029155383
	Line 86: 2020-10-03 22:44:05 DEBUG    rmsetotal:113571.62493750504
	Line 94: 2020-10-03 23:25:14 DEBUG    rmsetotal:114361.10705896917
	Line 102: 2020-10-04 00:02:56 DEBUG    rmsetotal:114335.53312919852
	Line 110: 2020-10-04 00:38:20 DEBUG    rmsetotal:114369.496028896
	Line 118: 2020-10-04 01:12:06 DEBUG    rmsetotal:113784.9630648605
	Line 126: 2020-10-04 01:44:14 DEBUG    rmsetotal:116311.50586313145
	Line 134: 2020-10-04 02:13:58 DEBUG    rmsetotal:115940.35641123354
	Line 142: 2020-10-04 02:43:16 DEBUG    rmsetotal:116406.80829638575
	Line 150: 2020-10-04 03:10:52 DEBUG    rmsetotal:115489.5481978767
	Line 158: 2020-10-04 03:37:12 DEBUG    rmsetotal:116601.01463608207

예측을 기반으로한 수익은 어떨까요?

	Line 5: 2020-10-03 12:56:27 DEBUG    sumtotal:-50.0
	Line 13: 2020-10-03 13:59:52 DEBUG    sumtotal:-67.5
	Line 21: 2020-10-03 15:02:41 DEBUG    sumtotal:-85.0
	Line 29: 2020-10-03 16:05:52 DEBUG    sumtotal:-32.5
	Line 37: 2020-10-03 17:09:38 DEBUG    sumtotal:-87.5
	Line 45: 2020-10-03 18:13:24 DEBUG    sumtotal:-160.0
	Line 53: 2020-10-03 19:16:56 DEBUG    sumtotal:-110.0
	Line 61: 2020-10-03 20:16:51 DEBUG    sumtotal:-117.5
	Line 69: 2020-10-03 21:10:05 DEBUG    sumtotal:-65.0
	Line 77: 2020-10-03 21:59:29 DEBUG    sumtotal:17.5
	Line 85: 2020-10-03 22:44:05 DEBUG    sumtotal:-152.5
	Line 93: 2020-10-03 23:25:14 DEBUG    sumtotal:-190.0
	Line 101: 2020-10-04 00:02:56 DEBUG    sumtotal:-165.0
	Line 109: 2020-10-04 00:38:20 DEBUG    sumtotal:-142.5
	Line 117: 2020-10-04 01:12:06 DEBUG    sumtotal:-200.0
	Line 125: 2020-10-04 01:44:14 DEBUG    sumtotal:-195.0
	Line 133: 2020-10-04 02:13:58 DEBUG    sumtotal:-92.5
	Line 141: 2020-10-04 02:43:16 DEBUG    sumtotal:-267.5
	Line 149: 2020-10-04 03:10:52 DEBUG    sumtotal:-120.0
	Line 157: 2020-10-04 03:37:12 DEBUG    sumtotal:-187.5

일반적으로 learning rate가 낮아지면 예측 error는 낮아지긴 하나 수익률은 그것과 꼭 비례하지는 않습니다.

전체 소스 링크를 첨부합니다.

실행은 predict_batch.py 로 지금까지 만든 내용을 한꺼번에 실행하도록 만들었습니다.

64bit 환경에서 실행이 되어야합니다.

https://drive.google.com/file/d/1YGFwAgcYNNHKTofPnEWAVE9sKOkQi16y/view?usp=sharing

SW정리

2020년 10월 4일 일요일

주가 예측 (5) - xgboost reg:squarederror, python, eta

댓글 없음:

댓글 쓰기