An inferential analysis on how a student's academic (study-time) and demographic (father's and mother's education level) factors (individually) affect their school performance.
Education is a key factor for achieving a long-term economic progress. This is an important research question because understanding what influences students’ grades can help schools, families, and students themselves make better decisions to improve learning outcomes. By looking at how different aspects of a student’s life, such as their home address type (rural or urban), their father’s/mother’s education level, study time, etc., we can gain valuable insights. This topic is important in social sciences because improving student performance has long-term benefits for both individuals and society, such as better career opportunities and greater social equality.
How parental education and study time affects the academic performance of a student?
“Students with higher parental education levels and more number of hours per week dedicated to studying have better academic performance, as measured by their final grade.”
This hypothesis stems from findings in educational research that parental involvement and support, as well as a student’s study habits, are key determinants of academic success.
The data approaches student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features, and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. For this project, we will combine the Mathematics (mat) and Portugal (por) dataset, and conduct the further analysis.
Here are some visual representations of the data. These graphs and plots help to illustrate key findings and insights from the analysis.
Figure 1: Here, in almost all cases, irrespective of the number of hours a student is dedicating to this studies in a week, the most number of students always seem to be a Pedu value of 4, suggesting that even if one of the parent is highly educated, they are able to have a positive effect on their child’s studytime. It is also interesting to note that the second most prominent Pedu value is 2, not 3.
Figure 2: We notice that for weekly study time level of 3, we have the highest median of final grade at 12.49, which makes sense as the more time one dedicates to studying in a week, the better they perform. However, for that of level 4, the median is slightly less at 12.27. While still pretty high, a slight decline from the previous result could be because studying for more than 10 hours in a week, along with doing other activities, could lead to exhaustion and worse mental and physical health, and consequently a bad academic performance. Studytime 1 and 2 have median final_grade median values of 10.58 and 11.33 respectively, again aligning with the hypothesis.
This study employs a structured data analysis pipeline, including data cleaning, summarization, processing, DAG construction, and linear regression modeling to examine the impact of parental education and study time on student performance.
The dataset, consisting of student performance data from two Portuguese schools, was first imported and merged. Initial pre-processing included:
mat) and Portuguese (por) datasets were merged for a more comprehensive analysis.Medu (mother’s education), Fedu (father’s education), studytime (weekly study time), subj (subject), and G3 (final grade).Pedu (Parental Education), by taking the maximum of Medu and Fedu, assuming that the highest parental education level has the strongest impact.A Directed Acyclic Graph (DAG) was constructed to illustrate causal relationships between key variables. The logic behind the DAG:
This DAG helps identify confounders and appropriate control variables for unbiased causal inference.
Figure: DAG representing the assumed causal relationships.
A linear regression model was applied to analyze the relationship between study time, parental education, and final grades, controlling for subject differences.
To study the individual/independent effect of studytime on final_grade, we consider all the paths between studytime and final_grade from our DAG and find all the open backdoor paths. To close these paths, we simply control for Pedu and subj variables between the path. The final model:
final_grade = β₀ + β₁(studytime) + β₂(Pedu) + ε
studytime is positive, suggesting that students who dedicate more hours to studying tend to have higher final grades.
Since the p-value is below 0.05, we reject the null hypothesis and conclude that study time is significantly associated with final grades.
We do not need to control for any variables in this scenario. The final model:
final_grade = β₀ + β₁(Pedu) + ε
Pedu is positive, meaning students with higher-educated parents tend to perform better academically.
Since the p-value is below 0.05, we conclude that parental education has a statistically significant effect on final grades.
subj) may also affect the results, suggesting variations in difficulty levels. 1. https://archive.ics.uci.edu/dataset/320/student+performance
2. https://repositorium.sdum.uminho.pt/bitstream/1822/8024/1/student.pdf
3. Cortez, P. (2008). Student Performance [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5TG7T.
4. Pinquart, M., & Ebeling, M. (2020). Parental educational expectations and academic achievement in children and adolescents—a meta-analysis. Educational Psychology Review, 32(2), 463-480.
5. Davis-Kean, P. E., Tighe, L. A., & Waters, N. E. (2021). The role of parent educational attainment in parenting and children’s development. Current Directions in Psychological Science, 30(2), 186-192.
6. Hammerstein, S., König, C., Dreisörner, T., & Frey, A. (2021). Effects of COVID-19- related school closures on student achievement-a systematic review. Frontiers in psychology, 12, 746289.
7. Wilder, S. (2023). Effects of parental involvement on academic achievement: a meta-synthesis. In Mapping the field (pp. 137-157). Routledge.