A Causal Inference Analysis of Parental Education and Study Time on Academic Success

An inferential analysis on how a student's academic (study-time) and demographic (father's and mother's education level) factors (individually) affect their school performance.

GitHub Detailed Report

Background

Education is a key factor for achieving a long-term economic progress. This is an important research question because understanding what influences students’ grades can help schools, families, and students themselves make better decisions to improve learning outcomes. By looking at how different aspects of a student’s life, such as their home address type (rural or urban), their father’s/mother’s education level, study time, etc., we can gain valuable insights. This topic is important in social sciences because improving student performance has long-term benefits for both individuals and society, such as better career opportunities and greater social equality.

Research Question

How parental education and study time affects the academic performance of a student?

Hypothesis

“Students with higher parental education levels and more number of hours per week dedicated to studying have better academic performance, as measured by their final grade.”

This hypothesis stems from findings in educational research that parental involvement and support, as well as a student’s study habits, are key determinants of academic success.

Dataset

The data approaches student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features, and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. For this project, we will combine the Mathematics (mat) and Portugal (por) dataset, and conduct the further analysis.

Graphs and Plots

Here are some visual representations of the data. These graphs and plots help to illustrate key findings and insights from the analysis.

Graph 1

Figure 1: Here, in almost all cases, irrespective of the number of hours a student is dedicating to this studies in a week, the most number of students always seem to be a Pedu value of 4, suggesting that even if one of the parent is highly educated, they are able to have a positive effect on their child’s studytime. It is also interesting to note that the second most prominent Pedu value is 2, not 3.

Graph 2

Figure 2: We notice that for weekly study time level of 3, we have the highest median of final grade at 12.49, which makes sense as the more time one dedicates to studying in a week, the better they perform. However, for that of level 4, the median is slightly less at 12.27. While still pretty high, a slight decline from the previous result could be because studying for more than 10 hours in a week, along with doing other activities, could lead to exhaustion and worse mental and physical health, and consequently a bad academic performance. Studytime 1 and 2 have median final_grade median values of 10.58 and 11.33 respectively, again aligning with the hypothesis.


Methodology

This study employs a structured data analysis pipeline, including data cleaning, summarization, processing, DAG construction, and linear regression modeling to examine the impact of parental education and study time on student performance.

Data Cleaning and Pre-processing

The dataset, consisting of student performance data from two Portuguese schools, was first imported and merged. Initial pre-processing included:

Directed Acyclic Graph (DAG) Construction

A Directed Acyclic Graph (DAG) was constructed to illustrate causal relationships between key variables. The logic behind the DAG:

This DAG helps identify confounders and appropriate control variables for unbiased causal inference.

DAG Representation

Figure: DAG representing the assumed causal relationships.

Linear Regression Model

A linear regression model was applied to analyze the relationship between study time, parental education, and final grades, controlling for subject differences.

Results and Findings

Does Study Time Have a Significant Effect on Final Grade?

To study the individual/independent effect of studytime on final_grade, we consider all the paths between studytime and final_grade from our DAG and find all the open backdoor paths. To close these paths, we simply control for Pedu and subj variables between the path. The final model:

                        final_grade = β₀ + β₁(studytime) + β₂(Pedu) + ε
                

The regression analysis indicates that study time has a statistically significant effect on final grades. The coefficient for studytime is positive, suggesting that students who dedicate more hours to studying tend to have higher final grades.

Since the p-value is below 0.05, we reject the null hypothesis and conclude that study time is significantly associated with final grades.

Does Parental Education Have a Significant Effect on Final Grade?

We do not need to control for any variables in this scenario. The final model:

                        final_grade = β₀ + β₁(Pedu) + ε
                
The results also show that parental education (Pedu) has a significant effect on student performance. The coefficient for Pedu is positive, meaning students with higher-educated parents tend to perform better academically.

Since the p-value is below 0.05, we conclude that parental education has a statistically significant effect on final grades.

Conclusion

References

1. https://archive.ics.uci.edu/dataset/320/student+performance
2. https://repositorium.sdum.uminho.pt/bitstream/1822/8024/1/student.pdf
3. Cortez, P. (2008). Student Performance [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5TG7T.
4. Pinquart, M., & Ebeling, M. (2020). Parental educational expectations and academic achievement in children and adolescents—a meta-analysis. Educational Psychology Review, 32(2), 463-480.
5. Davis-Kean, P. E., Tighe, L. A., & Waters, N. E. (2021). The role of parent educational attainment in parenting and children’s development. Current Directions in Psychological Science, 30(2), 186-192.
6. Hammerstein, S., König, C., Dreisörner, T., & Frey, A. (2021). Effects of COVID-19- related school closures on student achievement-a systematic review. Frontiers in psychology, 12, 746289.
7. Wilder, S. (2023). Effects of parental involvement on academic achievement: a meta-synthesis. In Mapping the field (pp. 137-157). Routledge.