Skip to content

vm239/Decision_Trees_Prediction

Repository files navigation

Prediction

Prediction of student behavior has been a prominent area of research in learning analytics and a major concern for higher education institutions and ed tech companies alike. It is the bedrock of methodology within the world of cognitive tutors and these methods have been exported to other areas within the education technology landscape. The ability to predict what a student is likely to do in the future so that interventions can be tailored to them has seen major growth and investment, though implementation is non-trivial and expensive. Although some institutions, such as Purdue University, have seen success we are yet to see widespread adoption of these approaches as they tend to be highly institution specific and require very concrete outcomes to be useful.

I have built three prediction models within the classification tree type, in order to predict if students enrolled into particular courses in the semester complete the course or not. This enables us to understand how better to allocate valuable resources while allowing students to have the freedom to choose and sample courses.

The data used in this project comes from a university's registrar and is first run through a scatterplot matrix in order to do an initial visualization of the available information.

Scatterplot Matrix

Scatterplot matrix

We notice that for the categorical variables, we are unable to gather meaningful visualization with a scatterplot, hence we use a jitter. Note that complete, our desired variable of interest which denotes if a student completes a course or not, is a categorical variable.

CompleteVsYear

CompleteVsEntrancescores

CompleteVsCourses

CompleteVsCourseID

We then clean and prepare our data for the three models, and generate summaries for each based on the parameter of Accuracy of the model.


The CART Tree

Cart tree model

The CART Tree summary

Cart Tree summary


The Conditional Inference Tree Model

Conditional inference model

The Conditional Inference Tree summary

Conditional inference summary


The C5.0 Tree Model

C5 0 Model and summary

The C5.0 Tree summary

C5 0 summary


A comparison of the different models are seen in this graph, as well as in the RMD and HTML files in this repository.

Three models comparison graph

These models provide different accuracy rates and the best out of them is seen to be the Conditional Inference Model. This model could be used for the most accurate predictions of students who are likely to complete courses they have enrolled into, and could solve the problem of optimum allocation of resources.


Future Work

  • Predictions provide a wide array of benefits and can be utilized across different industrial and service sectors.
  • Accuracy of different models of prediction can be sharpened by selecting variables of interest from a large dataset, and then running them through different packages for more customised results.
  • The type of variables within the available data expand the horizons on the different prediction models which are possible and suitable, based on which further investigation can be carried out beyond Classification Tree models.

Tools

  • R and R Studio
  • R packages
  • Git
  • GitHub

Author

Vidya Madhavan, MS candidate for Learning Analytics, Teachers College, Columbia University


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages