#SmartPLS4 Series 33 - How to use PLS Predict to assess Predictive Validity/Predictive Power?

Published: Jul 07, 2022 Duration: 00:19:36 Category: Education

Trending searches: predictit
in this session we are going to talk about evaluating structur model predictive power that is step four of our structural model assessment now as part of the structural model assessment we have had a look at colinearity issues how to assess colinearity we did look into to how to assess the significance and relevance of structural model relationship we did assess the model explanatory power and finally now we are going to assess the model's predictive power now many researchers interpret the R square statistic as a measure of their model's predictive power this interpretation is not entirely correct however since R sare only indicates the model's insample explanatory power in Sample here refers to the data that you have and out of sample to the data that you do not have but want to for cost or estimate so R square has got the capability to explain or provide explanatory power with respect to the sample that you have however you cannot estimate the sample that you do not have that is out of sample prediction is not possible it also says nothing about model predictive PA that is referred to as out of sample predictive power which indicates a model's ability to predict new or future observations so you cannot predict new and future observations based on R square now the question remains does the model has good predictive quality because at the end of the day we are using these techniques for prediction addressing this concern Shelli and others in 2016 introduc used PLS predict that is a procedure for out of sample prediction execution of PLS predict involves estimating the model on a training sample and evaluating its predictive performance on a hold out sample now your whole data set is actually divided into two subsamples one is your training sample you can call it sample one and the other one is your hold out sample that is your held out sample that is called sample two note that the hold out sample is actually separated from the total sample before execution of initial analysis on the training sample so it includes data that were not used in the model estimation so initially the estimation is based on training sample now researchers need to make sure that the training sample for each fold meets the minimum sample size guidelines so although you dividing your sample into training and holdout sample your estimation in order to be correct the training sample has to have met the minimum sample size guidelines and for that you can see the following inverse square root method and that link will be shared in the description now execution of PLS predict involves estimating the model on training sample so your estimation is done on training sample and evaluating its predictive performance on the hold out sample so your estimation done on training sample is then used to predict your hold out sample note that the hold out sample is actually separated from the total sample before executing the initial analysis on the training sample so your training sample is removed from your hold out sample now there are five folds here and PLS predict executes kfold cross validation a fold is a subgroup of the total sample so here is fold one fold two fold three fold four fold five that is the total data set is randomly split into K equally sized subsets of data so your sample is divided into subsets of data now for example let's say we've got five folds here K is equal to 5 normally 10 is recommended now in this case each fold is divided into hold out and training sample let's say say we've got a sample size of 100 now if that is divided into 1 2 3 4 training and one hold out sample so there'll be 20 here 20 here 20 here 20 here and 20 here now this here is your holdout sample and this here is your training sample so your estimation is done on training sample and based on that train estimation your hold out of hold out sample is predicted now this is for fold one now in fold two what happens is in fold one let's say this was your hold out sample in fold two this will become your hold out sample and the same estimation will be run again in fold three this will be your holdout sample and the estimation will be done again and similarly for the next two subsamples now moving on PS predict then combines K minus one subsets that is four groups of data here training training training these four groups combined into a single training sample that is used to predict this particular fifth holdout sample as we described just earlier hold out sample is predicted based on the training sample each case in every holdout sample has a predicted value estimated with the respective training sample so your hold out sample predicted based on training sample your hold out sample predicted based on on the rest of the sample again your hold out sample keeps changing unless or until you do for all the folds now a training sample is a portion of the overall data set used to estimate the model parameters that is the path coefficient indicator weights and loadings the remaining part of the data set that is not used for model estimation is your hold out sample and the training data set is used to estimate that is train the weights and Paths of our model and and then we use these estimated weights to predict the outcomes in your hold out sample so in short your training sample here predicts your holdout sample for each of the folds and then we evaluate our prediction matrices based on whatever analysis has been done and there are multiple ways to do this prediction we do not compare predictions of training and hold out samples and finally the K 10 is recommended now we are going to look into this this is by default 10 in smart PLS now the generation of K subsets of the data is a random process and can therefore result in extreme partitions that potentially lead to abnormal solution now to avoid such abnormal solution researcher should run PLS predict multiple times now to assess a models predictive power researchers can draw on several prediction statistics that quantify the amount of prediction error in the indicators of a particular endogenous construct so you look into the error or prediction error of the endogenous construct error in here is not an error as a mistake it is a residual the lower it is the better this is the difference between your actual values and the predicted values and you want your error to be minimized you want your values that is your actual values closer to the predicted values now the most popular metric to quantify the degree of prediction error how do you assess the degree of prediction error you use this particular metric root mean square error now another one is M AE mean absolute error now which one to use in most instances a researcher should use rmsse but if your prediction error distribution is highly nonsymmetric that is there is a long left or right tail in the distribution of prediction error for the endogenous variables then you are going to use Mae smart PLS will provide you the graphs for it now to assess the degree of prediction error use rmsse unless the prediction error distribution is highly non-symmetric in this case the Mae is more appropriate prediction statistic now to to interpret these metrices researchers need to compare indicators rmse or Mee based on the prediction error with the naive linear regression model now what you do is you compare these values here with your linear regression results so what if you used this model and tested it based on linear regression and then you tested it based on PLS is there any difference now your prediction error should be minimum when you are using using PLS instead of using linear regression model so a higher error in linear regression model and lower error in rmse would mean that you've got high predicted power now the linear model that is linear regression model Benchmark values are obtained by running a linear regression of each dependent construct indicators on the indicators of the exogenous construct in the PLS path model so what happens is you are simply running a simple linear regression model whereby you are assessing the impact of the exogenous constructs in the PLS path model that is the indicators of the exogenous constructs on the dependent constructs indicators and then you compare the rmsc values with the LM values and these are the guidelines that we are going to use now when all the indicators in the PLS SCM analysis have have low rmse or Mee value compared to the naive LM Benchmark the model has high predictive power now if the majority or the same number of indicators in PLS SCM analysis yields smaller prediction errors not all but majority then you've got medium predictive power now if minority of the constructs indicators produce lower SCM errors compared to the naive LM Benchmark this indicates low predictive power but when plsm analysis heals low prediction power that is rmse for none of the indicators that is the prediction error is higher in rmse or Mee in comparison to the linear model then you've got no predictive power now how do we do this in smart PLS so let's quickly go through smart PLS here is my model that we've been working on all along so what you do is go to calculate PLS predict and look at this number of HS 10 number of repetitions 10 that's fine keep it to default path and let's start now you won't get any graphical output so you just have to go to report now here are your results for all the endogenous variables posos was endogenous OC and others but before we get on to this one or whether we want to find out which one to compare this one with this one or this one with this one so what we are going to do is we are going to come here PS SCM error histogram that is Manifest variables your indicators and their histogram look at the endogenous variables here are your endogenous variables so look at this well it's symmetrical not two extreme values here not two extreme values here it looks fine there is no long left or right tail so it's it's not there are no extreme values look at this but there is a tailed yeah it's a left tailed at this this looks fine so I guess yes we can go for rmse because your endogenous variables the residual error looks symmetric not to extreme left and right tail values so let's use rmse so let's go to our report here back up and we are going to use MV prediction summary now we are going to use RMS e so look at this one this column here rmse will be compared to this column here LM rmse e now if we had problem here like if we had long left or right tail in the distribution of prediction then we would have gone for Mae but in this case this is this looks fine so we are going to come to manifest variables so let's have a look here this looks good if we compare this is low this is low low low higher higher higher higher in comparison to these values here 1.45 1.414 so this is low low low look at the values here so we've got in total five and then a 13 13 and 19 and 19 and 5 24 so so we've got 24 indicators and out of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 so out of 2413 have got higher rmse in comparison to LM rmse so this means here according to the guidelines we can say that our model has low predictive power it's almost equal but still 13 of these indicators if I'm not wrong I do not want to count again I've got higher rmsc values in comparison to LM rmsse so your model has got low predictive power so this is how you can use your PLS predict to assess your model's predictive part but the Q square is good all values are greater than zero and their values look at this they are are moderate to substantial we've done Q sare in one of the last sessions so you can come go back to it and look at the values now this is how you can compare or utilize PLS predict to assess the model's predictive power if your error is does not have long left and right Tails use rmsse otherwise you'll use Mae and what you have to do is just simply compare rmse values with LM rmse or otherwise what you can do is let's copy it to excel open Excel let's paste it and I do not need this let's delete it and let's delete it as well so here are your results in Excel the first thing is just remove these latent variables we do not need these and now what we need to do is let's look at the difference so a higher value here would mean that we do not have any issues and this is what we are looking for we want High values here but we want low values here because we want low prediction errors so let's is equal to this minus this cell here would mean that you've got a higher value here if there is a negative sign so look at this there is a negative sign this value is higher than this let's extend the formula here it is now go to conditional formatting and look at this less than 0er Okay so we've got 1 2 3 4 5 6 7 8 9 10 so out of 24 we've got 10 indicators where PLS SCM rmse is less than LM rmse so here LM RMC is higher higher higher look at this high high high but the problem is most of the times your rmse values are higher okay it's zero we can keep that other separated look at this this is high high this is value is high this value is high again these two values are higher these two values are higher so even if you remove these Zer here to zeros and add it to the 10 values here 1 2 3 4 5 6 7 8 9 10 so 12 out of 24 12 values here are higher so this means that you can say you have a moderate level of predictive power again the simple formula the first step look at your histograms the left and right tail just to make sure that you are going to use rmse or Mae for comparison once you through this just come here and compare compare either PLS M with LM or PLS SCM RM with LM rmse now these values here I want these values to be lower in comparison to these values now the easier way is again as I've done I put it in Excel and did the comparison you can also count it let's say how many values is equal to count if and the range is E2 to E e 25 and the criteria is how many values are less than 0o so those values that are less than zero mean that your PLS SCM rmse is lower so those values less than Z let's add zero as well okay so those values that are less than or equal to zero this means your plsm rmse is better so let's press enter and 12 values so out ofp 24 12 values have got lower rmse and 12 values have got rmse higher for LM but again we have used zero as well to give our model in PLS SCM a better chance to show productive power in this case we can say we can have a moderate level of predictive power I hope this session would have helped you understand the PLS predict in smart pls4 thank you very much for

Share your thoughts

Related Transcripts

Príncipe William mostra nova barba em público pela primeira vez | AFP thumbnail
Príncipe William mostra nova barba em público pela primeira vez | AFP

Category: News & Politics

Pode nem ser muito densa mas quando se trata de realeza qualquer penugem vale um flash nesta quinta-feira o que chamou a atenção das lentes da imprensa foi a barba do príncipe william o herdeiro da coroa britânica mostrou o novo visual em público pela primeira vez durante visita a uma exposição de arte... Read more

A Social Media Blueprint with Ryan Mitchell @ Vertex Inc thumbnail
A Social Media Blueprint with Ryan Mitchell @ Vertex Inc

Category: Education

Employee advocacy is a huge area of opportunity that you can have your smmes representing from an ambassador perspective the organization but your sales team as well and to your point maybe somebody doesn't want to follow the corporate channel because they don't want to just be bombarded with a bunch... Read more

A Social Media Blueprint with Ryan Mitchell @ Vertex Inc thumbnail
A Social Media Blueprint with Ryan Mitchell @ Vertex Inc

Category: Education

Introduction to employee advocacy employee advocacy is a huge area of opportunity that you can have your smmes representing from an ambassador perspective the organization but your sales team as well and to your point maybe somebody doesn't want to follow the corporate channel because they don't want... Read more

Elon Musk e a Volta do Twitter no Brasil #pablomarçal #elonmusk #twitter thumbnail
Elon Musk e a Volta do Twitter no Brasil #pablomarçal #elonmusk #twitter

Category: Gaming

Te faço o x pode voltar a operar no brasil interrogação pode pode sim ou não pode a rede social pode voltar operar no país mas para isso o dono dela que é o willam musk tem que fazer algumas coisinhas a primeiro pagar uma multa de 18 milhões de reis e acho que já vai subindo 18 era vai tem que bloquear... Read more