Lecture 29. K-Fold Cross Validation
This commit is contained in:
parent
a95ddb61b8
commit
95127b5eb4
@ -435,6 +435,35 @@
|
||||
"\n",
|
||||
"These \"hyper-parameters\" can be adjusted after testing with out of sample data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Types of Data\n",
|
||||
"Training data is used to optimize a given network and minimize loss.\n",
|
||||
"\n",
|
||||
"Validation data is used to optimize hyper-parameters of the network. Parameters like number of layers, neurons, neurons per layer, activation layer and their constants, epochs, learning rates, etc.\n",
|
||||
"\n",
|
||||
"Testing data is used to see the out of sample effectiveness of the trained network.\n",
|
||||
"\n",
|
||||
"## Splitting up Data\n",
|
||||
"### Given a Lot of Data\n",
|
||||
"Dataset is broken primarily into training data and then also validation and testing data.\n",
|
||||
"\n",
|
||||
"### Given Limited Data\n",
|
||||
"Dataset is only broken up into training data and testing data (maybe 80%-20%). K-Fold validation can be used in limited dataset scenarios.\n",
|
||||
"\n",
|
||||
"# K-Fold Cross Validation\n",
|
||||
"In the limited training dataset, you can split the dataset further into subsections, say 5. You then have 5 different combinations of data, where 1 subsection is considered the validation while the other 4 are considered the training data.\n",
|
||||
"\n",
|
||||
"When using 5 subsections of the training data, say {A, B, C, D, E}, you get 5 validation losses. The total validation loss is considered the average of all 5.\n",
|
||||
"\n",
|
||||
"For determining different hyper-parameters, you run the network on the same training data and choose the one with the lowest total validation loss.\n",
|
||||
"\n",
|
||||
"## Data Leakage\n",
|
||||
"While K-Fold is good for getting hyper-parameters with limited data, it can have data leakage if not correctly setup. For example, with timeseries data, it may get access to future information and train off of that."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
Binary file not shown.
@ -372,4 +372,29 @@ print(f'validation, acc: {accuracy:.3f}, loss: {loss:.3f}')
|
||||
#
|
||||
# These "hyper-parameters" can be adjusted after testing with out of sample data.
|
||||
|
||||
# %% [markdown]
|
||||
# # Types of Data
|
||||
# Training data is used to optimize a given network and minimize loss.
|
||||
#
|
||||
# Validation data is used to optimize hyper-parameters of the network. Parameters like number of layers, neurons, neurons per layer, activation layer and their constants, epochs, learning rates, etc.
|
||||
#
|
||||
# Testing data is used to see the out of sample effectiveness of the trained network.
|
||||
#
|
||||
# ## Splitting up Data
|
||||
# ### Given a Lot of Data
|
||||
# Dataset is broken primarily into training data and then also validation and testing data.
|
||||
#
|
||||
# ### Given Limited Data
|
||||
# Dataset is only broken up into training data and testing data (maybe 80%-20%). K-Fold validation can be used in limited dataset scenarios.
|
||||
#
|
||||
# # K-Fold Cross Validation
|
||||
# In the limited training dataset, you can split the dataset further into subsections, say 5. You then have 5 different combinations of data, where 1 subsection is considered the validation while the other 4 are considered the training data.
|
||||
#
|
||||
# When using 5 subsections of the training data, say {A, B, C, D, E}, you get 5 validation losses. The total validation loss is considered the average of all 5.
|
||||
#
|
||||
# For determining different hyper-parameters, you run the network on the same training data and choose the one with the lowest total validation loss.
|
||||
#
|
||||
# ## Data Leakage
|
||||
# While K-Fold is good for getting hyper-parameters with limited data, it can have data leakage if not correctly setup. For example, with timeseries data, it may get access to future information and train off of that.
|
||||
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user