앞에서 기본적인 예측 모델을 만들어 봤습니다. 그런데 parameter가 너무 많아서 정확히 예측하기란 상당히 어렵습니다.
이번에는 eta를 변경해가면서 어떤 경우가 가장 정확한 값을 예측하는지 알아 보도록 하겠습니다.
전체 소스를 보기전에 한턴에 보여지는 로그를 출력해봤습니다.
2020-10-03 11:53:57 DEBUG start feature 2020-10-03 11:55:53 DEBUG start end 2020-10-03 11:55:55 DEBUG {'eta': 0.01, 'objective': 'reg:squarederror'} 2020-10-03 12:56:27 DEBUG sumlist:[(0, 2.5, 7, 6, 1392.1660581831943, 0.027929875914426777), (1, 2.5, 1, 0, 7202.076459751516, 0.03675441499716935), (2, -2.5, 0, 1, 19475.120979134153, 0.0319474684835044), (3, -2.5, 0, 1, 949.5799261966629, 0.03389920755691996), (4, 0.0, 2, 2, 4907.361723146793, 0.040279250493609996), (5, -5.0, 0, 2, 7697.559682046201, 0.03257472269442078), (6, -10.0, 0, 4, 12122.552975588333, 0.031095138146977737), (7, -5.0, 2, 4, 4304.819984087155, 0.038819924863618835), (8, -17.5, 2, 9, 2410.0308303777374, 0.05141925381916544), (9, -2.5, 4, 5, 3982.2608785847924, 0.04801275052100042), (10, 0.0, 4, 4, 4813.783438728317, 0.04216307192831111), (11, -2.5, 0, 1, 9872.945332267307, 0.03529055346471463), (12, -5.0, 1, 3, 8863.012628966677, 0.04373859247710521), (13, 0.0, 2, 2, 5223.68460261262, 0.032242335577264), (14, 0.0, 1, 1, 3831.5022841706063, 0.04171700001130447), (15, 2.5, 8, 7, 2875.000850439347, 0.0364236622756733), (16, -5.0, 1, 3, 5578.390946286273, 0.04532776734872578), (17, 5.0, 2, 0, 518.6210621531944, 0.0424717190966389), (18, 2.5, 3, 2, 168.91330619903476, 0.023616024750522477), (19, -7.5, 1, 4, 3375.829687899367, 0.03990691286024174)] 2020-10-03 12:56:27 DEBUG sumtotal:-50.0 2020-10-03 12:56:27 DEBUG rmsetotal:109565.21363681929 2020-10-03 12:56:27 DEBUG rmsletotal:0.7556296472813153 2020-10-03 12:56:27 DEBUG rpath:[]
eta는 learning rate로 숫자가 커지면 학습이 빨라지고 작아지면 학습이 느려집니다. 그렇다면 단순하게 생각할때 숫자가 작으면 학습 정확도가 올라간다고 생각하기 쉽습니다. 실제로 그런지 테스트 해봤습니다.
eta 를 0.1 부터 0.2 까지 0.01씩 올라가면서 변화를 시켰고 결과를 알 수 있도록 출력하였습니다.
Line 3: 2020-10-03 11:55:55 DEBUG {'eta': 0.01, 'objective': 'reg:squarederror'} Line 11: 2020-10-03 12:58:21 DEBUG {'eta': 0.02, 'objective': 'reg:squarederror'} Line 19: 2020-10-03 14:01:46 DEBUG {'eta': 0.03, 'objective': 'reg:squarederror'} Line 27: 2020-10-03 15:04:37 DEBUG {'eta': 0.04, 'objective': 'reg:squarederror'} Line 35: 2020-10-03 16:07:48 DEBUG {'eta': 0.05, 'objective': 'reg:squarederror'} Line 43: 2020-10-03 17:11:35 DEBUG {'eta': 0.06, 'objective': 'reg:squarederror'} Line 51: 2020-10-03 18:15:19 DEBUG {'eta': 0.07, 'objective': 'reg:squarederror'} Line 59: 2020-10-03 19:18:52 DEBUG {'eta': 0.08, 'objective': 'reg:squarederror'} Line 67: 2020-10-03 20:18:47 DEBUG {'eta': 0.09, 'objective': 'reg:squarederror'} Line 75: 2020-10-03 21:12:00 DEBUG {'eta': 0.1, 'objective': 'reg:squarederror'} Line 83: 2020-10-03 22:01:27 DEBUG {'eta': 0.11, 'objective': 'reg:squarederror'} Line 91: 2020-10-03 22:46:01 DEBUG {'eta': 0.12, 'objective': 'reg:squarederror'} Line 99: 2020-10-03 23:27:09 DEBUG {'eta': 0.13, 'objective': 'reg:squarederror'} Line 107: 2020-10-04 00:04:50 DEBUG {'eta': 0.14, 'objective': 'reg:squarederror'} Line 115: 2020-10-04 00:40:15 DEBUG {'eta': 0.15, 'objective': 'reg:squarederror'} Line 123: 2020-10-04 01:14:01 DEBUG {'eta': 0.16, 'objective': 'reg:squarederror'} Line 131: 2020-10-04 01:46:09 DEBUG {'eta': 0.17, 'objective': 'reg:squarederror'} Line 139: 2020-10-04 02:15:53 DEBUG {'eta': 0.18, 'objective': 'reg:squarederror'} Line 147: 2020-10-04 02:45:16 DEBUG {'eta': 0.19, 'objective': 'reg:squarederror'} Line 155: 2020-10-04 03:12:49 DEBUG {'eta': 0.2, 'objective': 'reg:squarederror'}
rmse 합계를 출력해봤습니다. 첫번째 0.01 일때가 가장 작은 수치입니다. 이말의 의미는 eta가 0.01일때 error가 가장 작다는 의미입니다.
Line 6: 2020-10-03 12:56:27 DEBUG rmsetotal:109565.21363681929 Line 14: 2020-10-03 13:59:52 DEBUG rmsetotal:111194.95603916253 Line 22: 2020-10-03 15:02:41 DEBUG rmsetotal:111777.55994124137 Line 30: 2020-10-03 16:05:52 DEBUG rmsetotal:112556.70621922112 Line 38: 2020-10-03 17:09:38 DEBUG rmsetotal:112468.58425228977 Line 46: 2020-10-03 18:13:24 DEBUG rmsetotal:112747.17691691613 Line 54: 2020-10-03 19:16:56 DEBUG rmsetotal:112791.67206001066 Line 62: 2020-10-03 20:16:51 DEBUG rmsetotal:113258.163385734 Line 70: 2020-10-03 21:10:05 DEBUG rmsetotal:113873.4285542438 Line 78: 2020-10-03 21:59:29 DEBUG rmsetotal:113960.54029155383 Line 86: 2020-10-03 22:44:05 DEBUG rmsetotal:113571.62493750504 Line 94: 2020-10-03 23:25:14 DEBUG rmsetotal:114361.10705896917 Line 102: 2020-10-04 00:02:56 DEBUG rmsetotal:114335.53312919852 Line 110: 2020-10-04 00:38:20 DEBUG rmsetotal:114369.496028896 Line 118: 2020-10-04 01:12:06 DEBUG rmsetotal:113784.9630648605 Line 126: 2020-10-04 01:44:14 DEBUG rmsetotal:116311.50586313145 Line 134: 2020-10-04 02:13:58 DEBUG rmsetotal:115940.35641123354 Line 142: 2020-10-04 02:43:16 DEBUG rmsetotal:116406.80829638575 Line 150: 2020-10-04 03:10:52 DEBUG rmsetotal:115489.5481978767 Line 158: 2020-10-04 03:37:12 DEBUG rmsetotal:116601.01463608207
예측을 기반으로한 수익은 어떨까요?
Line 5: 2020-10-03 12:56:27 DEBUG sumtotal:-50.0 Line 13: 2020-10-03 13:59:52 DEBUG sumtotal:-67.5 Line 21: 2020-10-03 15:02:41 DEBUG sumtotal:-85.0 Line 29: 2020-10-03 16:05:52 DEBUG sumtotal:-32.5 Line 37: 2020-10-03 17:09:38 DEBUG sumtotal:-87.5 Line 45: 2020-10-03 18:13:24 DEBUG sumtotal:-160.0 Line 53: 2020-10-03 19:16:56 DEBUG sumtotal:-110.0 Line 61: 2020-10-03 20:16:51 DEBUG sumtotal:-117.5 Line 69: 2020-10-03 21:10:05 DEBUG sumtotal:-65.0 Line 77: 2020-10-03 21:59:29 DEBUG sumtotal:17.5 Line 85: 2020-10-03 22:44:05 DEBUG sumtotal:-152.5 Line 93: 2020-10-03 23:25:14 DEBUG sumtotal:-190.0 Line 101: 2020-10-04 00:02:56 DEBUG sumtotal:-165.0 Line 109: 2020-10-04 00:38:20 DEBUG sumtotal:-142.5 Line 117: 2020-10-04 01:12:06 DEBUG sumtotal:-200.0 Line 125: 2020-10-04 01:44:14 DEBUG sumtotal:-195.0 Line 133: 2020-10-04 02:13:58 DEBUG sumtotal:-92.5 Line 141: 2020-10-04 02:43:16 DEBUG sumtotal:-267.5 Line 149: 2020-10-04 03:10:52 DEBUG sumtotal:-120.0 Line 157: 2020-10-04 03:37:12 DEBUG sumtotal:-187.5
일반적으로 learning rate가 낮아지면 예측 error는 낮아지긴 하나 수익률은 그것과 꼭 비례하지는 않습니다.
전체 소스 링크를 첨부합니다.
실행은 predict_batch.py 로 지금까지 만든 내용을 한꺼번에 실행하도록 만들었습니다.
64bit 환경에서 실행이 되어야합니다.
https://drive.google.com/file/d/1YGFwAgcYNNHKTofPnEWAVE9sKOkQi16y/view?usp=sharing
댓글 없음:
댓글 쓰기