Wednesday, March 25, 2020
Morning sessions (10:30 AM)
Session 1A: Keynote Discussion and Panel, Location: (Room HZ 4)
Keynote chat 1A1 (20 min): Professor Shane Dawson: “Learning Analytics – A Field on the Verge of Relevance?”
Panel 1A2 (60 min): “Back to the future – A retrospective of 10 years Learning Analytics research and lessons for tomorrow”
Just 10 years ago, we were gathering data about our learners by using surveys or interviews with a selected representative number of stakeholders. This approach, however, had substantial limitations in sample size, time requirements, scope, and representative results. The usage of Learning Analytics tools has made data collection a much more affordable activity. Since then we have seen an exciting yet accelerating development of the field. In the past years, we have addressed various challenges on the way, have unveiled early myths and promises of learning analytics, and have identified new pressing needs for the adoption of learning analytics. Within this panel we will reflect about the young history of learning analytics and the challenges we have tackled and combine this with personal stories of early members of the learning analytics family.
Session 1B: Blended Learning, Location: (Room HZ 8)
Presentation 1B1 (30 min): “Analyzing the consistency in within-activity learning patterns in blended learning (Full research paper)
Keywords: Work Habits, Time Management, Time-series Analysis, Learner performance and Consistency, Regularity, Student Persistence
Performance and consistency play a large role in learning. This study analyzes the relation between consistency in students’ online work habits and academic performance in a blended course. We utilize the data from logs recorded by a learning management system (LMS) in two information technology courses. The two courses required the completion of monthly asynchronous online discussion tasks and weekly assignments, respectively. We measure consistency by using Data Time Warping (DTW) distance for two successive tasks (assignments or discussions), as an appropriate measure to assess similarity of time series, over 11-day timeline starting 10 days before and up to the submission deadline. We found meaningful clusters of students exhibiting similar behavior and we use these to identify three distinct consistency patterns: highly consistent, incrementally consistent, and inconsistent users. We also found evidence of significant associations between these patterns and learner’s academic performance.
Presentation 1B2 (20 min): “Evaluating Teachers’ Perceptions of Students’ Questions Organization (Short research paper)
Keywords: teacher’s perception, question organization, student’s need, student’s question, student’s profile, pedagogical interest
Students’ questions are a key to help teachers in assessing their understanding and adapting their pedagogy. However, in a flipped classroom context where many questions are asked online to be addressed in class, selecting questions can be difficult for teachers. To help them in this task, we present here three alternative questions organizations: one based on pedagogical needs, one based on estimated students’ profiles and one mixing both approaches). Results of a survey filled by 37 teachers in a flipped classroom pedagogy show no consensus over a single organization. A cluster analysis based on teachers’ flipped classroom experience allowed us to distinguish two profiles, but they were not associated with any particular question organization preference. Qualitative results suggest the need for different organizations may rely more on a pedagogical philosophy and advocates for differentiated dashboards.
Presentation 1B3 (30 min): “Predicting Student Success in a Blended Learning Environment (Full research paper)
Keywords: blended learning, grade prediction, e-learning, machine learning, logistic regression, random forest classification, learning analytics
Blended learning is gaining ground in contemporary education. However, studies on predictive learning analytics in the context of blended learning remain relatively scarce compared to Massive Open Online Courses (MOOCs), where such applications have gained a strong foothold. Data sets obtained from blended learning environments suffer from a high dimensionality and typically expose a limited number of instances, which makes predictive analysis a challenging task. In this work, we explore the log data of a master-level blended course to predict the students’ grades based entirely on the data obtained from an online module (a small private online course), using and comparing logistic regression and random forest-based predictive models. The results of the analysis show that, despite the limited data, success vs. fail predictions can be made as early as in the middle of the course. This could be used in the future for timely interventions, both for failure prevention as well as for reinforcing positive learning behaviours of students.
Session 1C: Dashboards and Visualisations, Location: (Room HZ 9)
Presentation 1C1 (30 min): “Comparing Teachers’ Use of Mirroring and Advising Dashboards (Full research paper)
Keywords: cooperative/collaborative learning, elementary education, human-computer interface, improving classroom teaching, teaching/learning strategies
Teachers play an essential role during collaborative learning. To provide effective support, teachers have to be constantly aware of students’ activities and make fast decisions about which group to offer support, without disrupting students’ collaborative process. Teacher dashboards are visual displays that provide analytics about learners to help teachers increase their awareness of the situation. However, if teachers are not able to efficiently and effectively distill information from the dashboard, the dashboard can become an obstacle instead of an aid. In the present study, we compared dashboards that provide information (mirroring) to dashboards that provide information and alert the teacher to groups that are in need of support (advising). Teachers were shown standardized, fictitious collaborative situations on one of the types of dashboards and were asked to detect the group that was in need of support. The results showed that teachers in the advising condition more often detected the problematic group, needed less effort to do so, and were more confident of their decisions. The teacher-dashboard interaction patterns showed that teachers in the advising condition generally started by checking the given alert, but also that they tried to look at as much information about other groups as they could. In the mirroring condition, teachers generally started by examining information from class overviews, but did not always have time to check information for individual groups. These findings are discussed in light of the role of a teacher dashboard in teachers’ decision making in the context of student collaboration.
Presentation 1C2 (20 min): “Learning analytics dashboards: the past, the present and the future (Short research paper)
Keywords: learning analytics dashboards, visualisation, interaction, evaluation
Learning analytics dashboards are at the core of the LAK vision to involve the human into the decision-making process and have been presented by several researchers over the past 10 years. The key focus of these dashboards is to support better human sense-making and decision-making by visualising data about learners to a variety of stakeholders. Early research on learning analytics dashboards focused on the use of visualisation and prediction techniques and demonstrates the rich potential of dashboards in a variety of learning settings. Present research increasingly uses participatory design methods to tailor dashboards to the needs of stakeholders, employs multimodal data acquisition techniques, and starts to research theoretical underpinnings of dashboards. In this paper, we present these past and present research efforts as well as the results of the workshop on that was held at LAK19 with experts in the domain to identify and articulate common practices and challenges for the domain. Based on an analysis of the results, we present a research agenda to help shape the future of learning analytics dashboards.
Presentation 1C3 (20 min): “Automated Insightful Drill-Down Recommendations for Learning Analytics Dashboards (Short research paper)
Best short research paper nominee
Keywords: Learning Analytics Dashboards, Visualization Recommendation, Drill-down Data Analysis
The big data revolution is an exciting opportunity for universities, which typically have rich and complex digital data on their learners. It has motivated many universities around the world to invest in the development and implementation of learning analytics dashboards (LADs). These dashboards commonly make use of interactive visualisation widgets to assist educators in understanding and making informed decisions about the learning process. A common operation in analytical dashboards is a “drill-down”, which in an educational setting allows users to explore the behaviour of sub-populations of learners by progressively adding filters. Nevertheless, drill-down challenges exist, which hamper the most effective use of the data, especially by users without a formal background in data analysis. Accordingly, in this paper, we address this problem by proposing an approach that recommends insightful drill-downs to LAD users. We present results from an application of our proposed approach using an existing LAD. A set of insightful drill-down criteria from a course with 875 students are explored and discussed.
Session 1D: Cognitive Psychology, Location: (Room HZ 11)
Presentation 1D1 (30 min): “DAS3H: Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills” (Invited paper)
EDM’19 best paper
Spaced repetition is among the most studied learning strategies in the cognitive science literature. It consists in temporally distributing exposure to an information so as to improve long-term memorization. Providing students with an adaptive and personalized distributed practice schedule would benefit more than just a generic scheduler. However, the applicability of such adaptive schedulers seems to be limited to pure memorization, e.g. flashcards or foreign language learning. In this article, we first frame the research problem of optimizing an adaptive and personalized spaced repetition scheduler when memorization concerns the application of underlying multiple skills. To this end, we choose to rely on a student model for inferring knowledge state and memory dynamics on any skill or combination of skills. We argue that no knowledge tracing model takes both memory decay and multiple skill tagging into account for predicting student performance. As a consequence, we propose a new student learning and forgetting model suited to our research problem: DAS3H builds on the additive factor models and includes a representation of the temporal distribution of past practice on the skills involved by an item. In particular, DAS3H allows the learning and forgetting curves to differ from one skill to another. Finally, we provide empirical evidence on three real-world educational datasets that DAS3H outperforms other state-of-the-art EDM models. These results suggest that incorporating both item-skill relationships and forgetting effect improves over student models that consider one or the other.
Presentation 1D2 (20 min): “Trace Data from Student Solutions to Genetics Problems Reveals Variance in the Processes Related to Different Course Outcomes (Short research paper)
Keywords: Clustering, Problem solving, Genetics, Metacognition
Problem solving, particularly in disciplines such as genetics, is an essential but difficult competency for students to master. Prior work indicated that trace data can be leveraged to measure the invisible cognitive processes that undergird learning activities such as problem solving. Building on prior work and given the importance and difficulties associated with genetics problem solving, we used unsupervised statistical methods (k-means clustering and feature selection) to characterize the patterns of processes students use during genetics problem solving and the relationship to proximal and distal outcomes. At the level of the individual problem, we found that conclusion processes, such as making claims and eliminating possible solutions, was an important interim step and associated with getting a particular problem correct. Surprisingly, we noted that a different set of processes was associated with course outcomes. Students who performed multiple metacognitive steps (e.g. monitoring, checking, planning) in a row or who engaged in execution steps (e.g. using information, drawing a picture, restating the process) as part of problem solving during the semester performed better on final assessments. We found a third set of practices, making consecutive conclusion processes, metacognitive processes preceding reasoning and reasoning preceding conclusions to be important for success at both the problem level and on final assessments. This suggests that different problem-solving processes are associated with success on different course benchmarks. This work raises provocative questions regarding best practices for teaching problem solving in genetics classrooms.
Presentation 1D3 (30 min): “How working memory capacity limits success in self-directed learning: a cognitive model of search and concept formation (Full research paper)
Keywords: Self-directed learning, Concept formation, Working memory capacity, Cognitive-computational modeling
With this work we intend to develop cognitive modules for learning analytics solutions used in inquiry learning environments that can monitor and assess mental abilities involved in self-directed learning activities. We realize this idea by drawing on models from mathematical psychology, which specify assumptions about the human mind algorithmically and thereby automate a theory-driven data analysis. We report a study to exemplify this approach in which N=105 15-year-old high school students perform a self-determined navigation in a taxonomy of dinosaur concepts. We analyze their search and learning traces through the lens of a connectionist network model of working memory (WM). The results are encouraging in three ways. First, the model predicts students’ average progress (as well as difficulties) in forming new concepts at high accuracy. Second, a simple (1-parameter) extension, which we derive from a meta-cognitive learning framework, is sufficient to also predict aggregated search patterns. Third, our initial attempt to fit the model to individual data offers some promising results: estimates of a free parameter correlate significantly with a measure of WM capacity. Together, we believe that these results help demonstrate a novel and promising way towards extending learner models by cognitive variables. We also discuss current limitations in the light of our future work on cognitive-computational scaffolding techniques in inquiry learning scenarios.
Session 1E: Co-designing Learning Analytics, Location: (Room HZ 12)
Presentation 1E1 (30 min): “LA-DECK: A card-based learning analytics co-design tool (Full research paper)
Keywords: Co-design, Learning analytics, Participatory design
Human-centred software design gives all stakeholders an active voice in the design of the systems that they are expected to use. However, this is not yet commonplace in Learning Analytics (LA). Co-design techniques from other domains therefore have much to offer LA, in principle, but there are few detailed accounts of exactly how such sessions unfold. This paper presents the rationale driving a card-based co-design tool specifically tuned for LA, called CardTool. In the context of a pilot study with students, educators, LA researchers and developers, we provide qualitative and quantitative accounts of how participants used the cards. Using three different forms of analysis (transcript-centric design vignettes, card-graphs and time on topic), we characterise in what ways the sessions were “participatory” in nature, and argue that the cards succeeded in playing very similar roles to those documented in the literature on successful card-based design tools.
Presentation 1E2 (20 min): “Engaging Students as Co-Designers of Learning Analytics (Practitioner report)
Best practitioner report nominee
Keywords: participatory design, student-facing dashboards, higher education
As Learning Analytics (LA) moves from theory into practice, researchers have called for increased participation of stakeholders in the design processes that render LA tools. The implementation of such methods, however, still requires attention and specification. In this practitioner report, we share strategies and insights from a co-design process that involved university students in the development of a LA tool. We describe a series of participatory design workshops we held and highlight three strategies for engaging students in the co-design of learning analytics tools.
Presentation 1E3 (30 min): “Inspiration Cards Workshops with Teachers in Early Co-Design Stages of Learning Analytics (Full research paper)
Keywords: Learning Analytics, co-design methods, inspiration cards, emerging technology
Despite the recognition of the need to include practitioners in the design of learning analytics (LA), especially teacher input tends to come later in the design process rather than in the definition of the initial design agenda. This paper presents a case study of a design project tasked with developing LA tools for a reading game for primary school children. Taking a co-design approach, we use the Inspiration Cards Workshop to ensure meaningful teacher involvement even for participants with low background in data literacy or experience in using learning analytics. We reflect on the process and findings to derive specific and transferable design principles that can guide the implementation of LA tools for primary school teachers in particular, and discuss opportunities and limitations of using the inspiration cards method that can inform future LA design efforts.
Early afternoon sessions (1:00 PM)
Session 2A: Tools & Infrastructures, Location: (Room HZ 4)
Presentation 2A1 (20 min): “Building a data warehouse for multimodal learning analytics research projects (Practitioner report)
Keywords: LRS, Multimodal learning events, data warehouse
This paper provides a practical approach for designing and implementing a data warehouse (or a learning records store) which supports interconnected research projects. The research projects come from the research area of learning technologies and all of them have learning analytics as an overlapping research topic that connects them.
Presentation 2A2 (20 min): “Fostering and Supporting Empirical Research on Evaluative Judgement via a Crowdsourced Adaptive Learning System (Short research paper)
Best short research paper nominee
Keywords: evaluative judgement, crowdsourcing, student-authored materials, educational technologies
The value of students developing the capacity to make accurate judgements about the quality of their work and that of others has been widely recognised in higher education literature. However, despite this recognition, little attention has been paid to the development of tools and strategies with the potential both to foster evaluative judgement and to support empirical research into its growth. This paper provides a demonstration of how educational technologies may be used to fill this gap. In particular, we introduce the adaptive learning system [Toolname] and describe how it aims to (1) develop evaluative judgement in large-class settings through suggested strategies from the literature such as the use of rubrics, exemplars and peer review and (2) enable large empirical studies at low cost to determine the effect-size of such strategies. A case study demonstrating how [Toolname] has been used to achieve these goals in a specific context is presented.
Presentation 2A3 (20 min): “Implementing an attrition model at scale (Practitioner report)
Best practitioner report nominee
Keywords: Students at risk, Predictive modelling, Attrition
The development and rollout of a predictive model of student attrition at a large university is described. Student data such as demographic information, enrolment choices and educational outcomes are used to train a machine learning algorithm and subsequently assign each student a score representing their predicted likelihood of dropping out of their program, and also departing the university entirely. These scores, along with considerable contextual information, are provided to program managers, most recently in a pilot project impacting 79 programs with over 17,000 student enrolments. We describe the methods used in developing this model, as well as the experience of communicating the outputs to program managers and the aspects that they have found most useful.
Presentation 2A4 (20 min): “Smart Dictionary for E-book Reading Analytics (Short research paper)
Keywords: English education, smart dictionary, e-book, learning analytics
Reading, be it intensive or extensive, is one of the key skills to master as an English as a foreign language (EFL) learner. Computerized e-book systems provide convenient access to learning materials inside and outside class. Students regularly check the meaning of a word or expression using a separate tool to progress on their reading, which is not only disruptive but can lead to other learning problems. An example of a particular issue faced in EFL is when a student learns an inappropriate meaning of a polysemous word for the context in which it is presented. This is also a problem for teachers as they often need to investigate the cause. In this paper, we propose a smart dictionary integrated into an e-book reading platform. It allows the learner to search and note word definitions directly with the purpose of reducing context switching and improve vocabulary retention. Finally, we propose that learner interactions with the system can be analyzed to support EFL teachers in identifying possible problems that arise through dictionary use while reading.
Session 2B: Participatory Design of Learning Analytics, Location: (Room HZ 8)
Presentation 2B1 (20 min): “Involving teachers in learning analytics design: Lessons learned from two case studies (Short research paper)
Keywords: Learning Analytics, Teachers, Co-design, Case Studies, Teacher Professional Development
Involving teachers in the design of technology-enhanced learning environments is a useful method for bridging the gap between research and practice. This is especially relevant for learning analytics tools, wherein the presentation of educational data to teachers or students requires meaningful sense-making to effectively support data-driven actions. In this paper, we present two case studies carried out in the context of two research projects in the USA and Spain which aimed to involve teachers in the co-design of learning analytics tools through professional development programs. The results of a cross-case analysis highlight lessons learned around challenges and principles regarding the meaningful involvement of teachers in learning analytics tooling design.
Presentation 2B2 (20 min): “For Evidence-Based Class Design with Learning Analytics: A Proposal of Preliminary Practice Flow Model in High School (Practitioner report)
Keywords: Learning Analytics, Study Logs, Digital Textbooks, Proposal Models, High School
This paper introduces a practice to incorporate a learning analytics dashboard that analyzes and visualizes learning logs using digital textbooks for high school students. Based on the knowledge gained through the practice over the past six months, an important model for incorporating learning analytics in high schools is proposed. In the future, we plan to revise the system and consider more specific class designs based on learning analytics.
Presentation 2B3 (20 min): “Why clicks are not enough: designing a Learning Analytics service for the Estonian Open Educational Resources ecosystem (Practitioner report)
Keywords: Open Educational Resources, National-level implementation, Learning Analytics, Learning Record Store, Learning Design.
Open Educational Resources (OERs) have gained the ground in the educational landscape, while being increasingly used by teachers in a flexible way to support classroom learning. However, the pedagogical practices that make use of OERs are not always activating students to enhance student-centered learning. Recent research on Learning Design supports teachers to design novel classroom practices, whereas research on Learning Analytics aims at enabling data-informed sensemaking about the learners’ interactions with the resources. This article summarizes a validation of a Learning Analytics service in the context of Estonian national-level deployment of OER ecosystem and analyses its limitations to evaluating the designs of the classroom practices using Learning Analytics data. The results of our study are contributing to generalization and improvement of LA services that are integrated into open ecosystem of OER.
Presentation 2B4 (20 min): “Learning-centred Translucence: an Approach to Understand How Teachers’ Talk About Classroom Data (Short research paper)
Keywords: Classroom evidence, human-centred design, participatory design, evidence-based decision-making
Teachers are increasingly being encouraged to embrace evidence-based practices to improve their teaching and support students’ learning. Learning analytics (LA) innovations offer great promise in supporting these practices by providing evidence for teachers and learners to make informed decisions and transform the educational experience. However, LA limitations and their uptake by educators are also coming under critical scrutiny. This is in part due to the lack of involvement of teachers and learners in the design of LA tools to understand existing educational practices that might inform the design of learning analytics, and the kinds of classroom data teachers actually need. In this paper, we propose a human-centred approach to generate understanding of teachers’ data needs through the lens of three key principles of translucence: visibility, awareness and accountability. We illustrate our approach through a participatory design sprint to identify how teachers talk about classroom data. We describe teachers’ perspectives on the evidence they need for making better-informed decisions and discuss the implications of our approach for the design of human-centred LA in the next years.
Session 2C: Learning Approaches, Location: (Room HZ 9)
Presentation 2C1 (20 min): “Exploring Student Approaches to Learning through Sequence Analysis of Reading Logs (Short research paper)
Keywords: Study approaches, sequence analysis, clustering, association rule mining, learning analytics
In this paper, we aim to explore students’ study approaches (e.g., deep, strategic, surface) from the logs collected by an electronic textbook (eBook) system. Data was collected from 89 students related to their reading activities both in and out of the class in a Freshman English course. Students are given a task to study reading materials through the eBook system, highlight the text that is related to the main or supporting ideas, and answer the questions prepared for measuring their level of comprehension. Students in and out of class reading times and, their usage of the marker feature were used as a proxy to understand their study approaches. We used theory-driven and data-driven approaches together to model the study approaches of students. Our results showed that three groups of students who have different study approaches can be identified. Relationships between students’ reading behaviors and their academic performance is also investigated by using association rule mining analysis. Obtained results are discussed in terms of monitoring, feedback, predicting learning outcomes, and identifying problems with the content design.
Presentation 2C2 (20 min): “Slow is Good: The Effect of Diligence on Student Performance in the Case of an Adaptive Learning System for Health Literacy (Short research paper)
Keywords: reading competence, health literacy, differentiation, diversity, adaptive e-learning system, clustering, learning analytics
This paper describes the analysis of temporal behavior of 11-15 year old students in a heavily instructionally designed adaptive e-learning environment. The e-learning system is designed to support student’s acquisition of health literacy, i.e., they should learn to understand health-related issues. The system adapts text difficulty depending on students’ reading competence, grouping students into four competence levels. Content for the four levels of reading competence was created by clinical psychologists, pedagogues and medicine students. The e-learning system consists of an initial reading competence assessment, texts about health issues, and learning tasks related to these texts. The research question we investigate in this work is whether temporal behavior is a differentiator between students despite the system’s adaptation to students’ reading competence, and despite students having comparatively little freedom of action within the system. Further, we also investigated the correlation of temporal behaviour with performance. Unsupervised clustering clearly separates students into slow and fast students with respect to the time they take to complete tasks. Furthermore, topic completion time is linearly correlated with performance in the tasks. This means that we interpret working slowly in this case as diligence, which leads to more correct answers, even though the level of text difficulty matches student’s reading competence. This result also points to the design opportunity to integrate advice on overarching learning strategies, such as working diligently instead of rushing through, into the student’s overall learning activity. This can be done either by teachers, or via additional adaptive learning guidance within the system.
Presentation 2C3 (20 min): “Decoding the Performance in an Out-of-Context Problem during Blocked Practice (Short research paper)
Keywords: Interleaving, Blocking, Massed Practice, ITS, K-12 Education, Mathematics
To master a skill, students generally practice the content of the skill in a blocking manner. While practicing in a blocked fashion, students know the context of the problems and also which strategy is needed to arrive at a solution. However, in real life standardized tests, where problems from various skills are grouped together, students often find it challenging to identify the correct strategy to solve the problems. This is because, during learning, students often practice the content in isolation. It hinders their ability to discriminate among the contexts of the problem. In this work, using tutor ZYX, we present students working on the topic Addition Word Problems with a subtraction word problem and investigate how they perform in the out-of-context subtraction word problem. We find that students’ performance in the topic Addition Word Problems is a strong predictor of their performance in this out-of-context problem. Our results suggest that it is a stronger predictor for higher grades (4th and 5th) compared to the lower (2nd and 3rd) grades.
Presentation 2C4 (20 min): “Exploring exam strategies of successful first year engineering students (Short research paper)
Keywords: Exam strategies, Modelling, Markov Chains
At present, universities collect study-related data about their students. This information can be used to support students at risk of failing their studies. At the Faculty of Mechanical Engineering (FME), Czech Technical University in Prague (CTU), the group of the first-year students is the most vulnerable. The most critical part of the first year is the winter exam period when students usually divide into those who will pass and fail. One of the most important abilities, students need to learn, is exam planning, and our research aims at the exploration of the exam strategies of successful students. These strategies can be further used for improving retention in the first-year students’ group. The outgoing research on the analysis of exam strategies of the first-year students in the academic year 2017/2018 is reported. From a total of 361 first-year students, successful students have been selected. The successful student is the one who finished all three mandatory exams before the end of the first exam period. From the exam sequences of 153 selected students, a “layered” Markov chain probabilistic graph has been constructed. It uncovered the most common exam strategies taken by those students.
Session 2D: Community of Inquiry Model, Location: (Room HZ 11)
Presentation 2D1 (20 min): “Dialogue attributes that inform depth and quality of participation in student discussion forums (Short research paper)
Best short research paper nominee
Keywords: text analysis, engagement, participation, Community of Inquiry, ICAP
This paper describes work in progress to answer the question of how we can identify and model the depth and quality of student participation in class discussion forums using the content of the discussion forum messages. We look at two widely-studied frameworks for assessing critical discourse and cognitive engagement: the ICAP and Community of Inquiry (CoI) frameworks. Our goal is to discover where they agree and where they offer complementary perspectives on learning. In this study, we train predictive classifiers for both frameworks on the same data set in order to discover which attributes are most predictive and how those correlate with the framework labels. We find that greater depth and quality of participation is associated with longer and more complex messages in both frameworks, and that the threaded reply structure matters more than temporal order. We find some important differences as well, particularly in the treatment of messages of affirmation.
Presentation 2D2 (20 min): “Towards automated analysis of cognitive presence in MOOC discussions: a manual classification study (Short research paper)
Keywords: cognitive presence, MOOC, text classification, online classification
This paper reports on early stages of a machine learning research project, where phases of cognitive presence in MOOC discussions were manually coded in preparation for training automated cognitive classifiers. We present a manual-classification rubric combining Garrison, Anderson and Archer’s (2001) coding scheme with Park’s (2009) revised version for a target MOOC. The inter-rater reliability between two raters achieved 95.4% agreement with a Cohen’s weighted kappa of 0.96, demonstrating our classification rubric is plausible for the target MOOC dataset. The classification rubric, originally intended for for-credit, undergraduate courses, can be applied to a MOOC context. We found that the main disagreements between two raters lay on adjacent cognitive phases, implying that additional categories may exist between cognitive phases in such MOOC discussion messages. Overall, our results suggest a reliable rubric for classifying cognitive phases in MOOC discussion messages. This indicates we are in a position to apply machine learning algorithms which can also cater for data with inter-rater disagreements in future automated classification studies.
Presentation 2D3 (30 min): “Towards Automatic Content Analysis of Social Presence in Transcripts of Online Discussions (Full research paper)
Keywords: Community of Inquiry Model, Content Analytics, Online Discussion, Text Classification, Epistemic Network Analysis
This paper presents an approach to automatic labeling of the content of messages in online discussion according to the categories of social presence. To achieve this goal, the proposed approach is based on a combination of traditional text mining features and word counts extracted with the use of established linguistic frameworks (i.e., LIWC and Coh-metrix). The best performing classifier obtained 0.95 and 0.88 for accuracy and Cohen’s kappa, respectively. This paper also provides some theoretical insights into the nature of social presence by looking at the classification features that were most relevant for distinguishing between the different categories. Finally, this study adopted epistemic network analysis to investigate the structural construct validity of the automatic classification approach. Namely, the analysis showed that the epistemic networks produced based on messages manually and automatically coded produced nearly identical results. This finding thus produced evidence of the structural validity of the automatic approach.
Late afternoon sessions (3:00 PM)
Session 3A: Self-regulated Learning, Location: (Room HZ 4)
Presentation 3A1 (30 min): “Analytics of Learning Strategies: the Association with the Personality Traits (Full research paper)
Best full research paper nominee
Keywords: learning strategies, approaches to learning, personality traits, learning analytics
Studying online requires strong skills to self-regulate choices of learning strategies. Learning analytics research has proposed novel methods that can extract theoretically meaningful learning strategies from trace data and that such strategies are associated with academic achievement. However, it is much less understood to what extent theoretically meaningful learning strategies can automatically be extracted in the context of massive open online courses(MOOCs). Moreover, there is a lacuna in research on the links of how automatically detected strategies are related to established psychological constructs. The paper reports on a study that (a) applied a state-of-the-art analytic method that combines process and sequence mining techniques to detect learning strategies from the trace data collected in a MOOC (N=1,397) and (b) explored associations of the detected strategies with academic performance and personality traits (Big Five). Four learning strategies detected with analytics were shown to be theoretically interpretable as the well-known approaches to learning. The results also revealed the four detected learning strategies were predicted by conscientiousness and agreeableness and were associated with academic performance. Implications for theoretical validity of and practical personalization with analytics-detected learning strategies are also provided.
Presentation 3A2 (20 min): “Effects of In-class and Out-of-class Learning Behaviors on Learning Performance and Self-regulated Learning Awareness (Practitioner report)
Keywords: Learning analytics, Self-regulated learning, In-class behavior, Out-of-class behavior
This study was designed to investigate effects of different learning behaviors in and out of class on students’ learning performance and self-regulated learning (SRL) awareness improvement. The study was conducted at an eight-week information technology course for 70 university students . Results revealed that during in-class activities, learning acquired using the support of functional tools such as markers and annotations with the support from instructors would benefit learning performance and improve SRL awareness. Out-of-class activities rather than functional tools focusing on specific pages showed positive effects on learning outcomes.
Presentation 3A3 (30 min): “Supporting actionable intelligence: Reframing the analysis of observed study strategies (Full research paper)
Keywords: trace data, learner behaviour, learning tactics and strategies, explanatory models
Models and processes developed in learning analytics research are increasing in sophistication and predictive power. However, the ability to translate analytic findings to practice remains problematic. This study aims to address this issue by establishing a model of learner behaviour that is both predictive of student course performance, and easily interpreted by instructors. To achieve this aim, we analysed fine grained trace data (from 3 offerings of an undergraduate online course, N=1068) to establish a comprehensive set of behaviour indicators aligned with the course design. The identified behaviour patterns, which we refer to as observed study strategies, proved to be associated with the student course performance. By examining the observed strategies of high and low performers throughout the course, we identified prototypical pathways associated with course success and failure. The proposed model and approach offers valuable insights for the provision of process-oriented feedback early in the course, and thus can aid learners in developing their capacity to succeed online.
Session 3B: Curriculum Analytics, Location: (Room HZ 8)
Presentation 3B1 (30 min): “Towards Skills-based Curriculum Analytics: Can we automate the recognition of prior learning? (Full research paper)
Keywords: curriculum analytics, lifelong learning, recognition of prior learning, semantic spaces, skills ontologies
In an era that will increasingly depend upon lifelong learning, we will need to facilitate the movement and sharing of data and information across institutional and geographic boundaries. This flow of data will help us to recognise prior learning and to personalise the learner experience. Here, we make use of curriculum analytics to consider the problem of recognising prior learning between educational institutions. We explore the potential utility of combining natural language processing and skills taxonomies to map between subject descriptions for two different institutions, benchmarking the performance of two algorithms and evaluating their results. We draw attention to some of the issues that arise, and list areas that we consider ripe for future work in this surprisingly underexplored area.
Presentation 3B2 (20 min): “Design of a Curriculum Analytics Tool to Support Continuous Improvement Processes in Higher Education (Short research paper)
Best short research paper nominee
Keywords: Learning Analytics, Currriculum Analytics, Higher Education
Curriculum analytics (CA) emerged as a sub-field of learning analytics, aiming to use evidence to drive continuous curriculum improvement. However, its overall impact on program-level decision-making remains unknown. In this context, this paper presents work-in-progress of a large research project to understand how CA could support continuous improvement processes in different university settings. We used a design-based research approach to design user-centered a CA tool, and then to evaluate its use through two iterative cycles. The first cycle consisted of an instrumental case study to evaluate its use to support 124 teaching staff in a 3-year continuous improvement process in a Latin American university. Lessons learned indicate that the tool helped staff to collect information for curriculum discussions, facilitating the availability of evidence regarding student competency attainment. For generalizing these lessons learned, future work to be done during the second cycle consists of evaluating the tool in different university settings.
Session 3C: Social Learning, Location: (Room HZ 9)
Presentation 3C1 (30 min): “Intergroup and Interpersonal Forum Positioning in Shared-Thread and Post-Reply Networks (Full research paper)
Keywords: Learner networks, social positioning, collective learning
Network analysis has become a major approach for analysing social learning. As such, it has been often used to capture learner positioning in online forum networks. LA research investigated if social positioning in forum networks was associated with academic performance and discourse quality, the latter two serving as proxies for learning. However, the research findings have been inconsistent, in part due to the discrepancies in the adopted approaches to network construction. Yet, it is still unclear how online forum networks should be modelled to assure that the learners’ social positioning is properly captured. To address this gap, the current study explored if some existing approaches to network construction may complement each other and thus offer richer insights. In particular, we hypothesised that the post-reply learner network could represent interpersonal positioning, whereas the network based on co-participation in discussion threads could encapsulate intergroup positioning. The study used learner social interaction data from a large edX MOOC forum to examine the relationship between these two kinds of social positioning. The results suggest that intergroup and interpersonal positioning may capture different aspects of social learning, potentially related to different learning outcomes. We find that although interpersonal and intergroup positioning indicators covary, these measures are not congruent for some 37% of forum posters. Network coevolution analysis also reveals an interdependent relationship between the intergroup and interpersonal social processes. Co-occurrence of learners in a discussion thread prior to direct exchanges is predictive of a direct post-reply interaction at a later stage of the course, and vice-versa, suggesting that intergroup positioning is a precursor of direct communication. The study contributes to the discussion around the definition of social positioning construct in learning analytics, and validated approaches towards measuring it.
Presentation 3C2 (20 min): “Towards Understanding the Lifespan and Spread of Ideas: Epidemiological Modeling of Participation on Twitter (Short research paper)
Keywords: Ideas, Epidemiology, Engagement Patterns, Networked Learning, Knowledge Creation, Connectivism
How ideas develop and evolve is a topic of interest for educators. By understanding this process, designers and educators are better able to support and guide collaborative learning activities. This paper presents an application of our Lifespan of an Idea framework to measure engagement patterns among individuals in communal socio-technical spaces like Twitter. We correlated engagement with social participation, enabling the process of idea expression, spread, and evolution. Social participation leads to transmission of ideas from one individual to another and can be gauged in the same way as evaluating diseases. The temporal dynamics of the social participation can be modeled through the lens of epidemiological modeling. To test the plausibility of this framework, we investigated social participation on Twitter using the tweet posting patterns of individuals in three academic conferences and one long term chat space. We used a basic SIR epidemiological model, where the rate parameters were estimated through Euler’s solutions to SIR model and non-linear least squares optimization technique. We discuss the differences in the social participation among individuals in these spaces based on their transition behavior into different categories of the SIR model. We also made inferences on how the total lifetime of these different twitter spaces affects the engagement among individuals. We conclude by discussing implications of this study and planned future research of refining the Lifespan of an Idea Framework.
Presentation 3C3 (20 min): “Socio-Temporal Dynamics in Peer Interaction Events (Short research paper)
Keywords: Relational event modelling, temporality, digital peer networks
Asynchronous online discussions are broadly used to support peer interaction in online and hybrid courses. In this paper, we argue the analysis of peer interaction in online discussions would benefit from focusing on relational events and considering a range of factors motivating the occurrence of a particular event. To demonstrate the possibility, we applied Relational Event Modeling (REM) to a dataset from seven online classes. In this modeling, we included (a) a learner attribute of temporal participation, (b) social dynamics factors such as preferential attachment and reciprocity, and (c) turn-by-turn sequential patterns. Results showed that learner activity, as well as familiarity through recency of interactions affects preferential attachment. Turn-by-turn sequential patterns explain much of the two-star network patterns that can affect triadic closure in the network. However, the models do not capture social dynamics of triadic closure well. Future work can investigate the role of knowledge building behaviors in triadic closure of networks based on peer interaction events. This study contributes fresh insights into social interaction in online discussions, calls for attention to micro-level temporal patterns, and motivates future work to scaffold learner participation in similar contexts.
Session 3D: Novel Uses of Learning Analytics, Location: (Room HZ 11)
Presentation 3D1 (30 min): “An Exploratory Approach to Measuring Collaborative Engagement in Child Robot Interaction (Full research paper)
Keywords: Human-robot interaction, engagement, automatic speech recognition, short-time signal processing, social robotics, human computer interaction, child robot interaction
This study investigated data analytic approaches to assessing young children’s engagement in collaborative activities mediated by a humanoid robot. Three sources of multimodal data were collected during the sessions, including the children’s voices, kinesics, and utterances. To develop our analytic models, we took a case-study approach and looked closely into four multimodal behaviors during three conversational sessions, which were coded by human annotation and automatic speech recognition. Information-theoretic methods were used to uncover nonlinear dependencies (called mutual information) among the multimodal behaviors of each child. From this information, and also grounded in the theory of engagement, we derived a model to compute a compound variable of engagement. This model computed engagement trends of each child, the engagement relationship between the children within a pair, and the engagement relationship with the robot over time. The computed trends corresponded well with the data from human observations. This approach has implications for quantifying engagement from rich and natural multimodal behaviors.
Presentation 3D2 (20 min): “The SEIRA approach: course embedded activities to promote academic integrity and literacies in first year Engineering (Short research paper)
Keywords: learning analytics, academic literacy, academic integrity, plagiarism, clustering, engagement
Students enrol into STEM programs with varying degrees of confidence with citing and referencing texts in their written work. Students often have an inclination to choose numbers over written language throughout schooling which means less opportunity to practice referencing and citation. This is compounded by large numbers of students for whom English is an additional language or who articulate from different cultural ways-of-doing. The Search, Evaluate, Integrate, Reference and Act Ethically (SEIRA) modules were developed to provide discipline-relevance to a confounding task. Data Analysis looking at the student engagement with the SEIRA site and subsequent student success provides an indication of the value of this approach to developing academic literacy across the STEM disciplines.
Presentation 3D3 (20 min): “Learning with Background Music: A Field Experiment (Short research paper)
Keywords: background music, learning engagement, learning performance, naturalistic setting, music information retrieval
Empirical evidence of how background music benefits or hinders learning becomes the crux of optimizing music recommendation in educational settings. This study aims to further probe the underlying mechanism by investigating the interactions among music characteristics, learning context, and learners’ personal factors. 30 participants were recruited to join a field experiment which was conducted in their own study places for one week. During the experiment, participants were asked to search for and listen to music while studying using a novel mobile-based music discovery application. A set of participant-related, context-related, and music-related data were collected via a pre-experiment questionnaire, surveys popped up in the music app, and the logging system of the music app. Preliminary results reveal correlations between certain music characteristics and learners’ task engagement and perceived task performance. This study is expected to provide evidence for understanding the effects of background music on learning, as well as implications for designing music recommendation systems that are capable of intelligently selecting background music for facilitating learning.
Thursday, March 26, 2020
Morning sessions (10:30 AM)
Session 4A: Keynote Discussion and Panel, Location: (Room HZ 2)
Keynote chat 4A1 (20 min): Professor Milena Tsvetkova: “Group Learning Analytics”
Panel 4A2 (60 min): “What Do We Mean By Rigor In Learning Analytics?”
This participatory session, led by the Journal of Learning Analytics (JLA) editorial team, invites the community to engage with the complex question of what constitutes a “rigorous paper” in learning analytics (LA). LA is a highly interdisciplinary field drawing machine-learning techniques, statistical analysis, as well as qualitative approaches, and the papers submitted to and published by JLA are diverse. While this breadth of work and orientations to LA are enormous assets to the field, they also creates challenges for defining and applying common standards of rigour across multiple disciplinary norms. Extending the conversation begun in the recent editorial of JLA 6(3), this session will examine indicators of quality that are significant in particular research traditions, indicators of quality that are common across
Session 4B: Institutional Adoption, Location: (Room HZ 8)
Presentation 4B1 (30 min): “The privacy paradox and its implications for learning analytics (Full research paper)
Keywords: Learning analytics, privacy, expectations, privacy paradox, higher education
Learning analytics promises to support adaptive learning in higher education. However, the associated issues around privacy protection, especially their implications for students as data subjects, has been a hurdle to wide-scale adoption. In light of this, we set out to understand student expectations of privacy issues related to learning analytics and to identify gaps between what students desire and what they expect to happen or choose to do in reality when it comes to privacy protection. To this end, an investigation was carried out in a UK higher education institution using a survey (N=674) and six focus groups (26 students). The study highlight a number of key implications for learning analytics research and practice: (1) purpose, access, and anonymity are key benchmarks of ethics and privacy integrity; (2) transparency and communication are key levers for learning analytics adoption; and (3) information asymmetry can impede active participation of students in learning analytics.
Presentation 4B2 (30 min): “Perceptions and expectations about Learning Analytics from a Brazilian Higher Education Institution (Full research paper)
Keywords: Learning analytics, Higher education institutions, Human factors, Qualitative research
Several tools to support learning processes based on educational data have emerged from research on Learning Analytics (LA) in the last few years. These tools aim to support students and instructors in daily activities, and academic managers in making institutional decisions. Although the adoption of LA tools is spreading, the field still needs to deepen the understanding of the contexts where learning takes place, and of the views of the stakeholders involved in implementing and using these tools. In this sense, the SHEILA framework proposes a set of instruments to perform a detailed analysis of the expectations and needs of different stakeholders in higher education institutions, regarding the adoption of LA. Moreover, there is a lacuna in research on stakeholders’ expectations from LA outside the Global North. Therefore, this paper reports on the findings of the application of interviews and focus groups, based on the SHEILA framework, with students and teaching staff from a Brazilian public university, to investigate their perceptions of the potential benefits and risks of using LA in higher education in the country. Findings indicate that there is a high interest in using LA for improving the learning experience, in particular, being able to provide personalized feedback, to adapt teaching practices to students’ needs, and to make evidence-based pedagogical decisions. From the analysis of these perspectives, we point to opportunities for using LA in Brazilian higher education.
Presentation 4B3 (20 min): “Using the Behaviour Change Wheel for Learning Analytics adoption (Practitioner report)
Keywords: Behaviour Change Wheel, change management, faculty engagement, learning analytics
This paper describes the development and piloting of a novel approach to supporting individual staff to adopt Learning Analytics (LA) to inform and enhance their teaching practice and learning design. The Behaviour Change Wheel (BCW), which provides a pragmatic guide for developing and implementing interventions provides the framework for this study and is explained in the context of this study. Possibilities for more widespread adoption are discussed. Initial feedback from a small (n=6) pilot study conducted over 20 weeks, suggests that the approach has merit with participants noting increased awareness and use, of LA.
Session 4C: Video Analytics, Location: (Room HZ 9)
Presentation 4C1 (30 min): “Exploring the Usage of Thermal Imaging for Understanding Video Lecture Designs and Students’ Experiences (Full research paper)
Keywords: Video lectures, Thermal imaging, Cognitive load, Instructional design
Video is becoming a dominant medium for the delivery of educational material. Despite the widespread use of video for learning, there is still a lack of understanding about how best to help people learn in this medium. This study demonstrates the use of thermal camera as compared to traditional self-reported methods for assessing learners’ cognitive load while watching video lectures of different styles. We evaluated our approach in a study with 78 university students viewing two variants of short video lectures on two different topics. To incorporate subjective measures, the students reported on mental effort, interest, prior knowledge, confidence, and challenge. Moreover, through a physical slider device, the students could continuously report on their perceived level of difficulty. Lastly, we used thermal sensor as an additional indicator of students’ level of difficulty and associated cognitive load. This was achieved through, continuous real-time monitoring of students by using a thermal imaging camera. This study aims to address the following: firstly, to analyze if video styles differ in terms of the associated cognitive load. Secondly, to assess the effects of cognitive load on learning outcomes; could an increase in the cognitive load be associated with poorer learning outcomes? Third, to see if there is a match between students’ perceived difficulty levels and a biological indicator. The results suggest that thermal imaging could be an effective tool to assess learners’ cognitive load, and an increased cognitive load could lead to poorer performance. Moreover, in terms of the lecture styles, the animated video lectures appear to be a better tool than the text-only lectures (in the content areas tested here). The results of this study may guide future works on effective video designs, especially those that consider the cognitive load.
Presentation 4C2 (30 min): “Is Faster Better? A Study of Video Playback Speed (Full research paper)
Keywords: moocs, randomized controlled trials, RCTs, playback speed, video interactions, clickstreams, video analytics, grades
In this paper, we explore the relationship between video playback speed and student learning outcomes. Using an experimental design, we present the results of a pre-registered study that assigns users to watch videos at either 1.0x or 1.25x speed. We find that when videos are sped up, students spend less time consuming videos and are marginally more likely to complete more video content. We also find that students who consume sped content are more likely to get better grades in a course, attempt more content, and obtain more certificates. These findings suggest that future study of playback speed as a tool for optimizing video content for MOOCs is warranted. Applications for reinforcement learning and adaptive content are discussed.
Presentation 4C3 (20 min): “Modelling Collaborative Problem-solving Competence with Transparent Learning Analytics: Is Video Data Enough? (Short research paper)
Best short research paper nominee
Keywords: physical learning analytics, collaborative problem-solving, decision trees, video analytics
In this study, we describe the results of our research to model collaborative problem-solving (CPS) competence, based on analytics generated from video data. We have collected ~500 mins video data from 15 groups of 3 students working to solve design problems collaboratively. Initially, with the help of OpenPose, we automatically generated frequency such as the number of the face-in-the-screen; and distance metrics such as the distance between bodies. Based on these metrics, we built decision trees to predict students’ listening, watching, making, and speaking behaviours as well as predicting the students’ CPS competence. Our results provide useful association rules mined from analytics of video data which can be used to inform teacher dashboards. Although the accuracy and recall values of the models built are inferior to previous machine learning work that utilizes multimodal data, the transparent nature of decision trees provides opportunities for easy adoption and explainable analytics for teachers and learners. We conclude the paper with a discussion on the value and limitations of our approach.
Session 4D: Intelligent Tutoring Systems, Location: (Room HZ 11)
Presentation 4D1 (30 min): “The relationship between confusion and metacognitive strategies in Betty’s Brain (Full research paper)
Keywords: confusion, confusion resolution, metacognitive strategy, learning analytics
Confusion has been shown to be prevalent during complex learning and has mixed effects on learning, depending on whether it is resolved. Confusion resolution requires learners to possess some skills, but it is unclear what these skills are. One possibility may be metacognitive strategies (MS), strategies for regulating cognition. This study examined the relationship between confusion and actions related to MS in Betty’s Brain, a computer-based learning environment. The results revealed that MS behavior differed during and outside confusion. However, confusion resolution was not related to MS behavior, and MS did not moderate the effect of confusion on learning.
Presentation 4D2 (30 min): “A Bayesian Model of Individual Differences and Flexibility in Inductive Reasoning for Categorization of Examples (Full research paper)
Keywords: Inductive Reasoning, Student Modeling, Adaptive Learning Environment, Process Mining
Inductive reasoning is an important educational practice but can be difficult for teachers to support in the classroom due to the high level of preparation and demand on classroom time needed to choose the teaching material. Intelligent tutoring systems can potentially facilitate this work for the teacher by supporting the automatic adaptation of examples based on a student model of the induction process. However, current models of inductive reasoning usually lack two main characteristics helpful to adaptive learning environments, individual differences of students and tracing of students’ learning as they receive feedback. In this paper, we describe a model to predict and simulate inductive reasoning of students for a categorization task. Our approach uses a Bayesian model for describing the reasoning processes of students. This model allows us to predict the choices in categorization questions by accounting for their feature biases. Using data gathered from 222 students categorizing three topics, we find that our model has a 75\% accuracy, which is 10\% greater than a baseline model. Our model is a contribution by enabling us to assign different bias profiles to individual students and tracking these profile changes over time. This model may be relevant for systematically analysing students’ differences and evolution in inductive reasoning strategies. It may also support the design of adaptive learning environments to support inductive learning approaches.
Session 4E: Collaborative Problem Solving, Location: (Room HZ 12)
Presentation 4E1 (30 min): “Focused or Stuck Together: Multimodal Patterns Reveal Triads’ Performance in Collaborative Problem Solving (Full research paper)
Best full research paper nominee
Keywords: Multimodal Learning Analytics, Interpretability, CSCL, CSCW
Collaborative problem solving in virtual environments is an increasingly important context of 21st century learning. However, our understanding of this complex and dynamic process is still limited. While previous research has explored data-driven multimodal approaches based on machine learning, they often suffered from the lack of model interpretability. In this work, we examine unimodal primitives (activity on the screen, speech, and body movements), and their multimodal combinations during remote collaborative problem solving (CPS). We analyzed two datasets where 116 triads (348 students) collaboratively solved challenging visual programming tasks over videoconferencing. We investigate how interaction and behavioral primitives and multimodal patterns are associated with the team’s subjective and objective performance. We found that idling in silence/back channeling and without movement was negatively correlated with task performance and with participants’ subjective perceptions of the interaction, whereas being silent and focused during solution execution was positively correlated with task performance. Results illustrate that in some cases, multimodal patterns improved the predictions and provided improved explanatory power compared to the unimodal primitives. We discuss how these findings can inform the design of real-time interventions in remote CPS.
Presentation 4E2 (30 min): “iSENS: An Integrated Approach to Combining Epistemic and Social Network Analyses (Full research paper)
Keywords: Collaborative problem solving, Epistemic network analysis, Social network analysis
Collaborative problem solving is defined as having cognitive and social dimensions. While network analytic techniques such as epistemic network analysis (ENA) and social network analysis (SNA) have been successfully used to investigate the patterns of cognitive and social connections that describe CPS, few attempts have been made to combine the two approaches. Building on prior work that used ENA and SNA metrics as independent predictors of collaborative learning, we propose and test the integrated social-epistemic network signature (iSENS), an approach that affords the simultaneous investigation of cognitive and social connections. We tested iSENS on data collected from military teams participating in training scenarios. Our results suggest that (1) these teams are defined by specific patterns of cognitive and social connections, (2) iSENS networks are able to capture these patterns, and (3) iSENS is a better predictor of team outcomes compared to ENA alone, SNA alone, and a non-integrated SENS approach.
Presentation 4E3 (20 min): “High resolution temporal network analysis to understand and improve collaborative learning (Short research paper)
Keywords: learning analytics, Social network analysis, Temporal networks, Temporarily, Collaborative learning, Problem-based learning, Medical education
There has been significant efforts in studying collaborative and social learning using aggregate networks. Such efforts have demonstrated the worth of the approach by providing insights about the interactions, student and teacher roles, and predictability of performance. However, using an aggregated network discounts the fine resolution of temporal interactions. By doing so, we might overlook the regularities/irregularities of students’ interactions, the process of learning regulation, and how and when different actors influence each other. Thus, compressing a complex temporal process such as learning may be oversimplifying and reductionist. Through a temporal network analysis of 54 students interactions (in total 3134 interactions) in an online medical education course, this study contributes with a methodological approach to building, visualizing and quantitatively analyzing temporal networks, that could help educational practitioners understand important temporal aspects of collaborative learning that might need attention and action. Furthermore, the analysis conducted emphasize the importance of considering the time characteristics of the data that should be used when attempting to, for instance, implement early predictions of performance and early detection of students and groups that need support and attention.
Early afternoon sessions (1:30 PM)
Session 5A: New Domains, Location: (Room HZ 2)
Presentation 5A1 (30 min): “Learning to Represent Healthcare Providers Knowledge of Neonatal Emergency Care (Full research paper)
Best full research paper nominee
Keywords: Global Health, Clinical training, Neonatal care, Emergency care, Smartphones, Deep Knowledge Tracing, Forgetting Curves, Long Short-Term Memory Neural Networks
Modelling healthcare providers’ knowledge while they are gaining new concepts is an important step towards supporting self-regulated personalised learning at scale. This is especially important if we are to address health workforce skills development and enhance the subsequent quality of care patients receive in the Global South, where a huge skills gap exists. Rich data about healthcare providers’ learning can be captured by their responses to close-ended problems within conjunctive solution space – such as clinical training scenarios for emergency care delivery – on smartphone-based learning interventions which are being proposed as a solution for reducing the healthcare skills gap in this context. Together with sequential data detailing a learner’s progress while they are solving a learning task, this provides useful insights into their learning behaviour. Predicting learning or forgetting curves from representations of healthcare providers knowledge is a difficult task, but recent promising machine learning advances have produced techniques capable of learning knowledge representations and overcoming this challenge. In this study, we train a Long Short-Term Memory neural network for predicting learners’ future performance and forgetting curves by feeding it sequence embeddings of learning task attempts from healthcare providers from Global South. From this training, the model captures nuanced representations of a healthcare provider’s clinical knowledge and their patterns of learning behaviours, predicting their future performance with high accuracy. More significantly, by differentiating reduced performance based on spaced learning, the model can help provide timely warning that helps support healthcare providers to reinforce their own self-regulated learning while providing a basis for personalised instructional support to aid improved clinical outcomes from their professional practices.
Presentation 5A2 (20 min): “Learning Analytics Applied in Commercial Aviation: Our Use Cases, Obstacles, and Help Needed From Academia (Practitioner report)
Keywords: Aviation, Competency Based Training, Obstacles to Learning Analytics Implementation
The growing demand for future aviation workers, including pilots, maintainers, and cabin crew, along with the need to maintain the highest quality and safety standards, results in a need to rethink our traditional approach to aviation training. Aviation training has typically followed the traditional model of classroom training, computer based training, simulator based training (for pilots) and hands on training (for mechanics). At the same time, the regulatory authorities are moving to a competency-based training method for measuring and certifying pilot readiness, rather than the traditional time-based criteria. This practitioner presentation will describe obstacles we face in our attempt to ameliorate some of these issues presented by these two realities, through adoption of a training analytics program in aviation training. It will outline the strides we have made, the challenges we face, and how we believe the academic learning analytics community can work with us.
Session 5B: Personalised Dashboards, Location: (Room HZ 8)
Presentation 5B1 (30 min): “Personalized Visualizations to Promote Young Learners’ SRL (Full research paper)
Keywords: Self-Regulated Learning, Adaptive Learning Technologies, Learner-faced dashboards, Hybrid Human-System Intelligence
This paper describes the design and evaluation of personalized visualizations to support young learners’ Self-Regulated Learning (SRL) in Adaptive Learning Technologies (ALTs). Our learning path app combines three Personalized Visualizations (PV) that are designed as an external reference to support learners’ internal regulation process. The personalized visualizations are based on three pillars: grounding in SRL theory, the usage of trace data and the provision of clear actionable recommendations for learners to improve regulation. This quasi-experimental pre-posttest study finds that learners in the personalized visualization condition improved the regulation of their practice behavior, as indicated by higher accuracy and less complex moment-by-moment learning curves compared to learners in the control group. Learners in the PV condition showed better transfer on learning. Finally, students in the personalized visualizations condition were more likely to under-estimate instead of over-estimate their performance. Overall, these findings indicate that the personalized visualizations improved regulation of practice behavior, transfer of learning and changed the bias in relative monitoring accuracy.
Presentation 5B2 (30 min): “How Patterns of Students Dashboard Use Are Related to Their Achievement and Self-Regulatory Engagement (Full research paper)
Keywords: Student-facing dashboard, self-regulated learning, sequence analytics, academic achievement
The aim of student-facing dashboards for students is to support learning by providing them with actionable information and promoting self-regulated learning. We created a new dashboard design aligned with SRL theory, called MyLA, to better understand how students use a learning analytics tool. We conducted sequence analysis on students’ interactions with three different visualizations in the dashboard, implemented in a LMS, for a large number of students (860) in ten courses representing different disciplines. To evaluate different students’ experiences with the dashboard, we computed Chi-squared tests of independence on dashboard users (52\%) to find frequent patterns that discriminate students by their differences in academic achievement and self-regulated learning behaviors. The results revealed discriminating patterns in dashboard use among different levels of academic achievement and self-regulated learning, particularly for low achieving students and high self-regulated learners. Our findings highlighted the importance of differences in students’ experience with a student-facing dashboard, and emphasized that one size does not fit all in the design of learning analytics tools.
Session 5C: Course Recommender Systems, Location: (Room HZ 9)
Presentation 5C1 (30 min): “Designing for Serendipity in a Course Recommendation System (Full research paper)
Keywords: Higher education, course guidance, filter bubble, neural networks
Collaborative filtering based algorithms, including Recurrent Neural Networks (RNN), tend towards predicting a perpetuation of past observed behavior. In a recommendation context, this can lead to an overly narrow set of suggestions lacking in serendipity and inadvertently placing the user in what is known as a “filter bubble.” In this paper, we grapple with the issue of the filter bubble in the context of a course recommendation system in production at a public university. Our approach is to present course results that are novel or unexpected to the student but still relevant to their interests. We build one set of models based on the course catalog description (BOW) and another set informed by enrollment histories (course2vec). We compare the performance of these models on off-line validation sets and against the system’s existing RNN-based recommendation engine in a user study of undergraduates (N = 70) who rated their course recommendations along six characteristics related to serendipity. Results of the user study show a dramatic lack of novelty in RNN recommendations and depict the characteristic trade-offs that make serendipity difficult to achieve. While the machine learned course2vec models performed best on concept generalization tasks (i.e, course analogies), it was the simple bag-of-words based recommendations that students rated as more serendipitous. We discuss the role of the recommendation interface and the information presented therein in the student’s decision to accept a recommendation from either algorithm.
Presentation 5C2 (20 min): “Complementing Educational Recommender Systems with Open Learner Models (Short research paper)
Keywords: Educational Recommender Systems, Open Learner Models, Educational Data Mining
Educational recommender systems (ERSs) aim to adaptively recommend a broad range of personalised resources and activities to students that will most meet their learning needs. Commonly, ERSs operate as a “black box” and give students no insight into the rationale of their choice. Recent contributions from the learning analytics and educational data mining communities have emphasised the importance of transparent, understandable and open learner models (OLMs) that provide insight and enhance learners’ understanding of interactions with learning environments. In this paper, we aim to investigate the impact of complementing ERSs with transparent and understandable OLMs that provide justification for their recommendations. We conduct a randomised control trial experiment using an ERS with two interfaces (“Non-Complemented Interface” and “Complemented Interface”) to determine the effect of our approach on student engagement and their perception of the effectiveness of the ERS. Overall, our results suggest that complementing an ERS with an OLM can have a positive effect on student engagement and their perception about the effectiveness of the system in spite of potentially making the system harder to navigate. In some cases, complementing an ERS with an OLM has the negative consequence of decreasing engagement, understandability and sense of fairness.
Session 5D: New Methodologies, Location: (Room HZ 11)
Presentation 5D1 (30 min): “Are Forum Networks Social Networks? A Methodological Perspective (Full research paper)
Keywords: null models, online posts, learner networks, online forums, collective learning
The mission of learning analytics (LA) is to improve learner experiences through insights from granular data collected in digital learning environments. While some areas of LA are maturing this is not consistent across all areas of LA interests and specialisations. For instance, LA research in social learning processes lacks the necessary tools and validated approaches to take into account how inter-course variability affects downstream analyses. While association between network structure and learning outcomes are commonly examined, it remains unclear whether such observed associations represent bona fide social effects, or merely reflect heterogeneity in posting activity. To overcome this issue, here we construct various null models for the generative processes underlying forum communication, and examine which features of student network topology are mere derivatives of upstream forum processes. In particular, the study argues that posting activity is essential to forum communication and should be explicitly included in both forum network representations and their modelling. By analysing forum networks in twenty online courses, the study demonstrates that forum posting activity is highly predictive of both the breadth (degree) and frequency (strength) of learner-to-learner interactions in the real-life networks in our dataset. This implies that interactions degree and frequency as standalone measures may not be relevant to capture the social effects in collective learning. On the other hand, the results suggest that clustering of the network structure is not simply a derivative of individual online posting behaviour. Hence, local clustering coefficient is probably a better reflection of the underlying dynamics of human relationships. The study is relevant to LA researchers and network scientists with an interest in social interactions and learner networks in digital learning.
Presentation 5D2 (20 min): “Rethinking time-on-task estimation with outlier detection accounting for individual, time, and task differences (Short research paper)
Keywords: Learning analytics, Temporal analysis, Outlier detection, Time-on-task, Measurement
Time-on-task estimation, measured as the duration between two consecutive clicks using student log-files data, has been one of the most frequently used metrics in learning analytics research. However, the process of handling outliers (i.e., excessively long durations) in time-on-task estimation is under-explored and often not explicitly reported in many studies. One common approach to handle outliers in time-to-task estimation is to ‘trim’ all durations using a cut-off threshold, such as 60 or 30 minutes. This paper challenges this existing approach by demonstrating that the treatment of outliers in an educational context should be individual-specific, time-specific, and task-specific. In other words, what can be considered as outliers in time-on-task depends on the learning pattern of each student, the stages during the learning process, and the nature of the task involved. The analysis showed that predictive models using time-on-task estimation accounting for individual, time, and task differences could explain 3-4% more variances in academic performance than models using an outlier trimming approach. As an implication, this study provides a theoretically grounded and replicable outlier detection approach for future learning analytics research when using time-on-task estimation.
Late afternoon sessions (3:00 PM)
Session 6A: Mining Study Strategies, Location: (Room HZ 2)
Presentation 6A1 (30 min): “Prediction of Students’ Assessment Readiness in Online Learning Environments: The Sequence Matters (Full research paper)
Keywords: Learning Analytics, LSTM, Assessment Readiness Prediction, Sequential Pattern Mining, MOOCs
Online learning environments are now pervasive in higher education. While not exclusively the case, in these environments, there is often modest teacher presence, and students are provided with access to a range of learning, assessment, and support materials. This places pressure on their study skills, including self-regulation. In this context, students may access assessment material without being fully prepared. This may result in limited success and, in turn, raise a significant risk of disengagement. Therefore, if the prediction of students’ assessment readiness was possible, it could be used to assist educators or online learning environments to postpone assessment tasks until students were deemed “ready”. In this study, we employed a range of machine learning techniques with aggregated and sequential representations of students’ behaviour in a Massive Open Online Course (MOOC), to predict their readiness for assessment tasks. Based on our results, it was possible to successfully predict students’ readiness for assessment tasks, particularly if the temporal aspects of behaviour were represented in the model. Additionally, we used sequential pattern mining to investigate which sequences of behaviour differed between high or low level of performance in assessments. We found that a high level of performance had the most sequences related to viewing and reviewing the lecture materials, whereas a low level of performance had the most sequences related to successive failed submissions for an assessment. Based on the findings, implications for supporting specific behaviours to improve learning in online environments are discussed.
Presentation 6A2 (30 min): “Analytics of Time Management and Learning Strategies for Effective Online Learning in Blended Environments (Full research paper)
Keywords: Blended learning, Learning analytics, Learning strategies, Time management strategies, Self-regulated learning
This paper reports on the findings of a study that proposed a novel learning analytic technique that combines three complementary techniques – agglomerative hierarchical clustering, epistemic network analysis, and process mining. The methodology allows for identification and interpretation of self-regulated learning in terms of the use of learning strategies. The main advantage of the new technique over the existing ones is that it combines the time management and learning tactic dimensions of learning strategies, which are typically studied in isolation. The new technique allows for novel insights into learning strategies by studying the frequency of, the strength of connections between, and ordering and time of execution of time management and learning tactics. The technique was validated in a study that was conducted on the trace data of first-year undergraduate students who were enrolled into two consecutive offerings (N2017 = 250 and N2018 = 232) of a course at an Australian university. The application of the proposed technique identified four strategy groups derived from three distinct time management tactics and five learning tactics. The tactics and strategies identified with the technique were correlated with academic performance and were interpreted according to the established theories and practices of self-regulated learning.
Presentation 6A3 (30 min): “Combining Analytic Methods to Unlock Sequential and Temporal Patterns of Self-Regulated Learning (Full research paper)
Keywords: Applied computing, Education, Learning management systems, Computing methodologies, Machine learning, Learning settings, Active learning settings
The temporal and sequential nature of learning is receiving increasing focus in Learning Analytics circles. The desire to embed studies in recognised theories of self-regulated learning (SRL) has led researchers to conceptualise learning as a process that unfolds and changes over time. To that end, a body of research knowledge is growing which states that traditional frequency-based correlational studies are limited in narrative impact. To further explore this, we analysed trace data collected from online activities of a sample of 239 computer engineering undergraduate students enrolled on a course that followed a flipped class-room pedagogy. We employed SRL categorisation of micro-level processes based on a recognised model of learning, and then analysed the data using: 1) simple frequency measures; 2) epistemic network analysis; 3) temporal process mining; and 4) stochastic process mining. We found that a combination of analyses provided us with a richer insight into SRL behaviours than any one single method. We found that better performing learners employed more optimal behaviours in their navigation through the course’s learning management system.
Session 6B: Testing and Assessment, Location: (Room HZ 8)
Presentation 6B1 (30 min): “R2DE: a NLP approach to estimating IRT parameters of newly generated questions (Full research paper)
Best full research paper nominee
Keywords: learning analytics, natural language processing, knowledge tracing, item response theory, educational data mining
The main objective of exams consists in performing an assessment of students’ expertise on a specific subject. Such expertise, also referred to as skill or knowledge level, can then be leveraged in different ways (e.g., to assign a grade to the students, to understand whether a student might need some support, etc.). Similarly, the questions appearing in the exams have to be assessed in some way before being used to evaluate students. Standard approaches to questions’ assessment are either subjective (e.g., assessment by human experts) or introduce a long delay in the process of question generation (e.g., pretesting with real students). In this work we introduce R2DE (which is a Regressor for Difficulty and Discrimination Estimation), a model capable of assessing newly generated multiple-choice questions by looking at the text of the question and the text of the possible choices. In particular, it can estimate the difficulty and the discrimination of each question, as they are defined in Item Response Theory. We also present the results of extensive experiments we carried out on a real world large scale dataset coming from an e-learning platform, showing that our model can be used to perform an initial assessment of newly created questions and ease some of the problems that arise in question generation.
Presentation 6B2 (30 min): “Applying Prerequisite Structure Inference to Adaptive Testing (Full research paper)
Keywords: prerequisite detection, adaptive testing, knowledge models
Modeling student knowledge is important for assessment design, adaptive testing, curriculum design, and pedagogical intervention. The assessment design community has primarily focused on continuous latent-skill models with strong conditional independence assumptions among knowledge items, while the prerequisite discovery community has developed many models that aim to exploit the interdependence of discrete knowledge items. This paper attempts to bridge the gap by asking, “When does modeling assessment item interdependence improve predictive accuracy?” A novel adaptive testing evaluation framework is introduced that is amenable to techniques from both communities, and an efficient algorithm, Directed Item-Dependence And Confidence Thresholds (DIDACT), is introduced and compared with an Item-Response-Theory based model on several real and synthetic datasets. Experiments suggest that assessments that are narrow in scope benefit significantly from modeling item interdependence.
Presentation 6B3 (30 min): “Predicting the Difficulty of Multiple Choice Questions in a High-stakes Medical Exam” (Invited paper)
ACL SIG-EDU’19 best paper
Predicting the construct-relevant difficulty of Multiple-Choice Questions (MCQs) has the potential to reduce cost while maintaining the quality of high-stakes exams. In this paper, we propose a method for estimating the difficulty of MCQs from a high-stakes medical exam, where all questions were deliberately written to a common reading level. To accomplish this, we extract a large number of linguistic features and embedding types, as well as features quantifying the difficulty of the items for an automatic question-answering system. The results show that the proposed approach outperforms various baselines with a statistically significant difference. Best results were achieved when using the full feature set, where embeddings had the highest predictive power, followed by linguistic features. An ablation study of the various types of linguistic features suggested that information from all levels of linguistic processing contributes to predicting item difficulty, with features related to semantic ambiguity and the psycholinguistic properties of words having a slightly higher importance. Owing to its generic nature, the presented approach has the potential to generalize over other exams containing MCQs.
Session 6C: Prompts and Feedback, Location: (Room HZ 9)
Presentation 6C1 (30 min): “How good is my feedback? A Content Analysis of Written Feedback (Full research paper)
Keywords: Feedback, Learning Analytics, Content Analysis, Online learning
Feedback is a crucial element in helping students identify gaps and assess their learning progress. In online courses, feedback becomes even more critical as it is one of the resources where the teacher interacts directly with the student. However, with the growing number of students enrolled in online learning, it becomes a challenge for instructors to provide good quality feedback that helps the student self-regulate. In this context, this paper proposed a content analysis of feedback text provided by instructors based on different indicators of good feedback. A random forest classifier was trained and evaluated at different feedback levels. The results achieved outcomes up to 87% and 0.39 of accuracy and Cohen’s κ, respectively. The paper also provides insights into the most influential textual features of feedback that predict feedback quality.
Presentation 6C2 (30 min): “Understanding Students’ Engagement with Personalised Feedback Messages (Full research paper)
Best full research paper nominee
Keywords: Feedback, Learning Analytics, Higher Education, Feedback Gap, Data-Driven Approaches
Feedback is a major factor of student success within higher education learning. However, recent changes – such as increased class sizes and socio-economic diversity of the student population – challenged the provision of effective student feedback. Although the use of educational technology for personalised feedback to diverse students has gained traction, the feedback gap still exists: educators wonder which students respond to feedback and which do not. In this study, a set of trackable Call to Action (CTA) links was embedded in two sets of feedback messages focusing on students’ time management, with the goal of (1) examining the association between feedback engagement and course success and (2), to predict students’ reaction to provided feedback. We also conducted two focus groups to further examine students’ perception of provided feedback messages. Our results revealed that early engagement with the feedback was associated with higher chances of succeeding in the course. Likewise, previous engagement with feedback was highly predictive of students’ engagement in the future, and also that certain student sub-populations, (e.g., female students), were more likely to engage than others. Such insight enables instructors to ask “why” questions, improve feedback processes and narrow the feedback gap. Practical implications of our findings are further discussed.
Presentation 6C3 (20 min): “Characterizing and influencing students’ tendency to write self-explanations in online homework (Short research paper)
Keywords: self-explanation, prompts, engagement, online homework, randomized experiment
In the context of online programming homework for a university course, we explore the extent to which learners engage with optional prompts to self-explain answers they choose for problems, which can benefit learning in laboratory and classroom settings. But there is less data about the extent to which students engage with such prompts when they are optional additions to online homework. We report data from a deployment of self-explanation prompts in online programming homework, providing insight into how the frequency of writing explanations is correlated with different variables, like how early they start homework, whether they got a problem correct, and their proficiency in the language of instruction. We also report suggestive results from a randomized experiment comparing several methods for increasing the rate at which people write explanations, such as including a motivational message or including more than one kind of prompt. These findings provide insight into promising dimensions to explore in understanding how real students may engage with prompts to explain answers.
Session 6D: Methodological Considerations, Location: (Room HZ 11)
Presentation 6D1 (30 min): “Testing the reliability of inter-rater reliability (Full research paper)
Keywords: interrater reliability, coding, reliability, validity, statistical analysis
Analyses of learning often rely on coded data. One important aspect of coding is establishing reliability. Previous research has shown that the common approach for establishing coding reliability is seriously flawed in that it produces unacceptably high Type I error rates. This paper focuses on testing whether or not these error rates correspond to specific reliability metrics or a larger methodological problem. Our results show that the method for establishing reliability is not metric specific, and we suggest the adoption of new practices to control Type I error rates associated with establishing coding reliability.
Presentation 6D2 (30 min): “Constructing and Predicting School Advice for Academic Achievement: A Comparison of Item Response Theory and Machine Learning Techniques (Full research paper)
Keywords: E-Learning, Item Response Theory’, Machine Learning’, Neural Networks, Random Forests, Explainable AI
Educational tests can be used to estimate pupils’ abilities and thereby give an indication of whether their school type is suitable for them. However, tests in education are usually conducted for each content area separately which makes it difficult to combine these results into one single school advice. To this end, we provide a comparison between both domain-specific and domain-agnostic methods for predicting school advice. Both use data from a pupil monitoring system in the Netherlands, which keeps track of pupils’ educational progress over several years by a series of tests measuring multiple skills. An IRT model is calibrated from which an ability score is extracted and is subsequently plugged into a multinomial log- linear regression model. Second, we train a random forest (RF) and a shallow neural network (NN) and apply case weighting to give extra attention to pupils who switched between school types. When considering the performance of all pupils, RFs provided the most accurate predictions followed by NNs and IRT respectively. When only looking at the performance of pupils who switched school type, IRT performed best followed by NNs and RFs. Case weighting proved to provide a major improvement for this group. Lastly, IRT was found to be much easier to explain in comparison to the other models. Thus, while ML provided more accurate results, this comes at the cost of a lower explainability in comparison to IRT.
Presentation 6D3 (30 min): “Exploration of the Robustness and Generalizability of the Additive Factors Model (Full research paper)
Keywords: student modeling, Additive Factors Model, learning curves, knowledge components, introductory programming
Additive Factors Model is a widely used student model, which is primarily used for refining knowledge component models (Q-matrices). We explore the robustness and generalizability of the model. We explicitly formulate simplifying assumptions that the model makes and we discuss methods for visualizing learning curves based on the model. We also report on an application of the model to data from a learning system for introductory programming; these experiments illustrate possibly misleading interpretation of model results due to differences in item difficulty. Overall, our results show that greater care has to be taken in the application of the model and in the interpretation of results obtained with the model.
Friday, March 27, 2020
Morning sessions (10:30 AM)
Session 7A: Keynote Discussion and Panel, Location: (Room HZ 2)
Keynote chat 7A1 (20 min): Professor Allyson Hadwin: “Smart Learners Versus Smart Systems: Leveraging Learning Analytics for Self-regulated Learning”
Panel 7A2 (60 min): “EARLI and LAK societies contribution for advancing self-regulated learning research – expert panel discussion”
Recent multidisciplinary research has addressed understanding the complex nature of the temporally unfolding self-regulated learning processes by using multimodal and online trace methods (e.g., log-files, eye tracking, think-aloud protocols, physiological sensors, video data collection). The value of multimodal methods is enabled by progress in machine learning and learning analytics and future implications can be expected for the design of advanced learning technologies for teaching and learning support. Despite these benefits of multimodal multichannel data, analyzing and understanding these data comes with challenges which are still under exploration in research community. It is still not known which data modalities are relevant in extracting information about learning processes. Epistemological and ontological challenges exist which may cause serious misinterpretations in data and analysis. These issues will be discussed and pushed further by the panelists working in an intersection of EARLI and SOLAR societies.
Session 7B: Multi-modal Learning Analytics, Location: (Room HZ 8)
Presentation 7B1 (30 min): “Predicting learners’ effortful behaviour in adaptive assessment using multimodal data (Full research paper)
Keywords: adaptive assessment, effort prediction, multimodal learning analytics, hidden Markov models
Many factors influence learners’ performance on an activity, beyond the knowledge required. Learners’ on-task effort has been acknowledged for strongly relating to their educational outcomes, reflecting how actively they are engaged in that activity. However, effort is not directly observable. Multimodal data can provide additional insights into the learning processes and may allow for effort estimation. This paper presents an approach for the prediction of effort in an adaptive assessment context. Specifically, the behaviour of 32 students was captured during an adaptive self-assessment activity, using logs and physiological data (i.e., eye-tracking, EEG, wristband and facial expressions). We applied k-means to the multimodal data to cluster students’ behavioural patterns. Next, we predicted students’ effort to complete the upcoming task, based on the discovered behavioural patterns using a combination of Hidden Markov Models (HMMs) and the Viterbi algorithm. We also compared the results with other state-of-the-art prediction algorithms (SVM, Random Forest). Our findings provide evidence that HMMs can encode the relationship between effort and behaviour (captured by the multimodal data) in a more efficient way than the other methods. Foremost, a practical implication of the approach is that the derived HMMs also pinpoint the moments to provide preventive/prescriptive feedback to the learners in real-time, by building-upon the relationship between behavioural patterns and the effort the learners are putting in.
Presentation 7B2 (20 min): “Detecting learning in noisy data: The case of oral reading fluency (Short research paper)
Keywords: children’s reading, reading fluency, oral reading fluency, reading app, book reading, reading analytics
In a school context, learning is usually detected by repeated measurements of the skill of interest through a sequence of specially designed tests; in particular, this is the case with tracking improvement in oral reading fluency in elementary school children in the U.S. Results presented in this paper suggest that it is possible and feasible to detect improvement in oral reading fluency using data collected during children’s independent reading of a book using the Relay Reader app. We are thus a step closer to the vision of having a child read for the story, not for a test, yet being able to unobtrusively assess their progress in oral reading fluency.
Presentation 7B3 (20 min): “Using a Cluster-Based Regime-Switching Dynamic Model to Understand Embodied Mathematical Learning (Short research paper)
Keywords: Multimodal Learning Analytics, Embodied Cognition, Mathematical Learning, Dynamic Models
Embodied learning and the design of embodied learning platforms have gained popularity in recent years due to the increasing availability of sensing technologies. In our study we made use of the Mathematical Imagery Trainer for Proportion (MIT-P) that uses a touchscreen tablet to help students explore the concept of mathematical proportion. The use of sensing technologies provides an unprecedented amount of high-frequency data of students’ behaviors. We investigated a statistical model called mixture Regime-Switching Hidden Logistic Transition Process (mixRHLP) and fit it to the students’ hands motion data. Simultaneously, the model finds characteristic regimes and assigns students to clusters of regime transitions. To provide validity evidence as to the nature of these regimes and clusters, we explore some properties in students’ and tutor’s verbalization associated with these different phases.
Session 7C: MOOCs, Location: (Room HZ 9)
Presentation 7C1 (30 min): “edX Log Data Analysis Made Easy (Full research paper)
Keywords: MOOC, edX, learning analytics, visualizations
Massive Open Online Courses (MOOCs), delivered on platforms such as edX and Coursera, have led to a surge in large-scale learning research. MOOC platforms gather a continuous stream of learner traces, which can amount to several Gigabytes per MOOC, that learning analytics researchers use to conduct exploratory analyses as well as to evaluate deployed interventions. edX has proven to be a popular platform for such experiments, as the data each MOOC generates is easily accessible to the institution running the MOOC. One of the issues researchers face is the preprocessing, cleaning and formatting of those large-scale learner traces. It is a tedious process that requires considerable computational skills. To reduce this burden, a number of tools have been proposed and released with the aim of simplifying this process. Those tools though still have a significant setup cost (requiring the setup of a server), are already out-of-date or require already preprocessed data as a starting point. In contrast, in this paper we introduce ELAT, the edX Log file Analysis Tool, which is a browser-based (i.e. no setup costs), keeps the data local (i.e., no server is necessary and the privacy-sensitive learner data is not send anywhere) and takes edX data dumps as input. ELAT does not only process the raw data, but also generates semantically meaningful units (learner sessions instead of just click events) that are visualized in various ways (learning paths, forum participation, video watching sequences). We report on two evaluations we conducted: (i) a technological evaluation and a (ii) user study with potential end users of ELAT. ELAT is open-source; for review purposes we anonymized it and made a short demonstration video available at https://vimeo.com/user103400556/elatdemo.
Presentation 7C2 (20 min): “Assessment that matters: Balancing reliability and learner-centered pedagogy in MOOC assessment (Short research paper)
Keywords: MOOCs, Assessment, Learning Analytics
Learner-centered pedagogy highlights active learning and formative feedback. Instructors often incentivize learners to engage in such formative assessment activities by crediting their completion and score in the final grade, a pedagogical practice that is very relevant to MOOCs as well. However, previous studies have shown that too many MOOC learners exploit the anonymity to abuse the formative feedback, that is critical in the learning process, to earn points without effort. Unfortunately, limiting feedback and access to decrease cheating is counter-pedagogic and reduces the openness of MOOCs. We aimed to identify and analyze a MOOC assessment strategy that balances this tension between learner-centered pedagogy, incentive design, and reliability of the assessment. In this study, we evaluated an assessment model that MITx Biology introduced in a MOOC to reduce cheating with respect to its effect on two aspects of learner behavior – the amount of cheating and learners’ engagement in formative course activities. The contribution of the paper is twofold. First, this work provides MOOC designers with an ‘analytically-verified’ MOOC assessment model to reduce cheating without compromising learner engagement in formative assessments. Second, this study provides a learning analytics methodology to approximate the effect of such an intervention.
Presentation 7C3 (20 min): “Macro MOOC Learning Analytics: Exploring Trends Across Global and Regional Providers (Short research paper)
Keywords: MOOCs, Learning Analytics, Multi-plaform Analytics Collaboration, Large-scale Analytics, Cultural Factors
Massive Open Online Courses (MOOCs) have opened new educational possibilities for learners around the world. Numerous providers have emerged, which usually have different targets (geographical, topics or language), but most of the research and spotlight has been concentrated on the global providers and studies with limited generalizability. In this work we apply a multi-platform approach generating a joint and comparable analysis with data from millions of learners and more than ten MOOC providers that have partnered to conduct this study. This allows us to generate learning analytics trends at a macro level across various MOOC providers towards understanding which MOOC trends are globally universal and which of them are context-dependent. The analysis reports preliminary results on the differences and similarities of trends based on the country of origin, level of education, gender and age of their learners across global and regional MOOC providers. This study exemplifies the potential of macro learning analytics in MOOCs to understand the ecosystem and inform the whole community, while calling for more large scale studies in learning analytics through partnerships among researchers and institutions.
Session 7D: Linking with Self-regulated Learning Theory, Location: (Room HZ 11)
Presentation 7D1 (30 min): “Self-Regulated Learning and Learning Analytics in Online Learning Environments: A Review of Empirical Research (Full research paper)
Keywords: self-regulated learning, learning analytics, literature review
Self-regulated learning (SRL) can predict academic performance. Yet, it is difficult for learners. The ability to self-regulate learning becomes even more important in emerging online learning settings. To support learners in developing their SRL, learning analytics (LA), which can improve learning practice by transforming the ways we support learning, is critical. This study is based on the analysis of 54 papers on LA empirical research for SRL in online learning contexts published between 2011 and 2019. The research question is: What is the current state of the applications of learning analytics to measure and support students’ SRL in online learning environments? The focus is on SRL aspects, methods, forms of SRL support, evidence for LA and types of online learning settings. Zimmerman’s model (2002) was used to examine SRL phases. The evidence about LA was examined in relation to four propositions: whether LA i) improve learning outcomes, ii) improve learning support and teaching, iii) are deployed widely, and iv) used ethically. Results showed most studies focused on SRL parts from the forethought and performance phase but much less focus on reflection. We found little evidence for LA that showed i) improvements in learning outcomes (20%), ii) improvements in learning support and teaching (22%). LA was also found iii) not used widely and iv) few studies (15%) approached research ethically. Overall, the findings show LA research was conducted mainly to measure rather than to support SRL. Thus, there is a critical need to exploit the LA support mechanisms further in order to ultimately use them to foster student SRL in online learning environments.
Presentation 7D2 (20 min): “Building a Unified Data Platform to Support Metacognition and Self Regulated Learning (Practitioner report)
Keywords: Learning Analytics, Metacognition, Self-regulated Learning, Student-Facing Learning Visualizations
To capitalize on the promise of personalized learning analytics, our applications must rely on a comprehensive view of student learning in data. This becomes especially true as student-facing analytics evolve from descriptive analytics into experiences that scaffold reflection. Ironically, as the number of tools used by students grows, so does the data ecosystem become fragmented, challenging our ability to build valuable analytics. In short, to build personalized analytics you must first solve a key problem: from many tools, build a comprehensive, unified portrait of the learner in their learning environment. With such a foundation in place, it becomes possible, we believe, to build effective, analytics-driven student experiences. In this project, we report on an effort to build a unified data ecosystem and a personalized learning application in parallel.
Presentation 7D3 (30 min): “What College Students Say, and What They Do: Aligning Self-Regulated Learning Theory with Behavioral Logs (Full research paper)
Keywords: LMS, self-regulated learning, self-reports, trace data
A central concern in learning analytics specifically and educational research more generally is the alignment of robust, coherent measures to well developed conceptual and theoretical frameworks of teaching and learning. Capturing and representing processes of learning remains an ongoing challenge in all areas of educational inquiry and presents substantive considerations on the nature of learning, knowledge, and assessment & measurement that have been continuously refined in various areas of education and pedagogical practice. Learning analytics, as a still developing, method of inquiry has yet to substantively navigate the alignment of measurement, capture, and representation of learning to theoretical frameworks despite being used to identify various practical concerns such as at risk students. This study seeks to address these concerns by comparing behavioral measurements from learning management systems to established measurements of components of learning as understood through self-regulated learning frameworks. Using several prominent and robustly supported self-reported survey measures designed to identify dimensions of self-regulated learning, as well as typical behavioral features extracted from a learning management system, we conducted descriptive and exploratory analyses on the relational structures of these data. With the exception of learners’ self-reported time management strategies, the current results indicate that behavioral measures were not well correlated with survey measurements. Possibilities and recommendations for learning analytics as measurements for self-regulated learning are discussed.
Session 7E: Learning Analytics Past and Future, Location: (Room HZ 12)
Presentation 7E1 (30 min): “Let’s Shine Together! A Comparative Study between Learning Analytics and Educational Data Mining (Full research paper)
Keywords: Learning Analytics, Educational Data Mining, Hierarchical Topic Detection, Language Modeling
Learning Analytics and Knowledge (LAK) and Educational Data Mining (EDM) are two of the most popular venues for researchers and practitioners to report and disseminate discoveries in data-intensive research on technology-enhanced education. After the development of about a decade, it is time to scrutinize and compare these two venues. By doing this, we expected to inform relevant stakeholders of a better understanding of the past development of LAK and EDM and provide suggestions for their future development. Specifically, we conducted an extensive comparison analysis between LAK and EDM from four perspectives, including (i) the topics investigated; (ii) community development; (iii) community diversity; and (iv) research impact. Furthermore, we applied one of the most widely-used language modeling techniques, i.e., Word2Vec, to capture words that were used frequently by researchers to describe future works that can be pursued by building upon suggestions made in the published papers so as to shed light on potential directions for future research.
Presentation 7E2 (20 min): “Learning Analytics Challenges: Trade-offs, Methodology, Scalability (Short research paper)
Keywords: trade-offs, scalability, methodology, challenge
Ryan Baker presented in a LAK 2019 keynote a list of six grand challenges for learning analytics research. The challenges are specified as problems with clearly defined success criteria. Education is, however, a domain full of ill-defined problems. I argue that learning analytics research should reflect this nature of the education domain and focus on less clearly defined, but practically essential issues. As an illustration, I discuss three important challenges of this type: addressing inherent trade-offs in learning environments, the clarification of methodological issues, and the scalability of system development.
Presentation 7E3 (30 min): “From childhood to maturity: Are we there yet?: Mapping the intellectual progress in learning analytics during the past decade (Full research paper)
Keywords: Co-word analysis, bibliometrics, conceptual evolution, learning analytics
This study aims to identify the conceptual structure and the thematic progress in Learning Analytics (evolution) and to elaborate on backbone/emerging topics in the field (maturity) from 2011 to September 2019. To address this objective, this paper employs hierarchical clustering, strategic diagrams and network analysis to construct the intellectual map of the Learning Analytics community and to visualize the thematic landscape in this field, using co-word analysis. Overall, a total of 459 papers from the proceedings of the Learning Analytics and Knowledge (LAK) conference and 168 articles published in the Journal of Learning Analytics (JLA), and the respective 3092 author-assigned keywords and 4051 machine-extracted key-phrases, were included in the analyses. The results indicate that the community has significantly focused in areas like Massive Open Online Courses and visualizations; Learning Management Systems, assessment and self-regulated learning are also basic topics, yet topics like natural language processing and orchestration are emerging. The analysis highlights the shift of the research interest throughout the past decade, and the rise of new topics, comprising evidence that the field is expanding. Limitations of the approach and future work plans conclude the paper.
Early afternoon sessions (1:00 PM)
Session 8A: Learning Design, Location: (Room HZ 2)
Presentation 8A1 (30 min): “From Theory to Action: Developing and Evaluating Learning Analytics for Learning Design (Full research paper)
Keywords: learning analytics for learning design, learning design, design-based research, teacher-facing learning analytics, theory of learning
The effectiveness of using learning analytics for learning design primarily depends upon two concepts: grounding and alignment. This is the primary conjecture for the study described in this paper. In our design-based research study, we design, test, and evaluate teacher-facing learning analytics for an online inquiry science unit on global climate change. We design our learning analytics in accordance with the constructivism-based pedagogical framework, called Knowledge Integration, and the principles of Learning Analytics Implementation Design. Our methodology for the design process, draws upon principles of the Orchestrating for Learning Analytics framework to engage key stakeholders (teachers, researchers, and developers). The resulting learning analytics were aligned to unit activities that engaged students in key aspects of the knowledge intergration process. They provided teachers with actionable insight into their students’ understanding at critical juncture points in the learning process. We demonstrate the efficacy of the learning analytics to support the optimization of the learning design of the unit. We conclude by synthesizing the principles that guided our design process into a framework for developing and evaluating learning analytics for learning design.
Presentation 8A2 (30 min): “Disciplinary Differences in Blended Learning Design: A Network Analytic Study (Full research paper)
Keywords: faculty, learning activity types, epistemic network analysis
Learning design research has predominately relied upon survey- and interview-based methodologies, both of which are subject to limitations of social desirability and recall. An alternative approach is offered in this manuscript, whereby physical and online learning activity data is analysed using Epistemic Network Analysis. Using a sample of 6,040 course offerings from 10 faculties across a four year period (2016-2019), the utility of networks to understand learning design is illustrated. Specifically, through the adoption of a network analytic approach, the following was found: universities are clearly committed to blended learning, but there are considerable differences both between and within disciplines.
Session 8B: Text Analytics, Location: (Room HZ 8)
Presentation 8B1 (20 min): “#Confused and Beyond: Detecting Confusion in Course Forums using Student Annotated Hashtags (Short research paper)
Keywords: confusion detection, affective learning, student forums
Confusion is a barrier for learning, contributing to loss of motivation and to disengagement with course materials. However, detecting students’ confusion in large-scale courses is expensive in time and resources. This paper provides a new approach for confusion detection in online forums that is based on harnessing the power of students’ self-reported affective states (hashtags). It provides a definition of confusion, based on students’ hashtags in their posts, that is shown to align with teachers’ judgement. We use this definition to inform the design of an automated classifier for confusion detection that is more robust, even when there are no self-reported emojis present in the test set. We demonstrate this approach in a large scale Biology course using the Nota Bene annotation platform. This work lays the foundation of better support tools for teachers for detecting and alleviating confusion in online courses.
Presentation 8B2 (30 min): “How and How Well Do Students Reflect?: Multi-Dimensional Automated Reflection Assessment in Health Professions Education (Full research paper)
Keywords: Reflection, Reflection Assessment, Health Professions Education, Classification, Natural Language Processing, Content Analysis
Reflection is a critical part of health professions education that supports students in becoming thoughtful practitioners. Assessing student reflection at scale can provide feedback in support of this process. However, this remains a challenge due to the demanding nature of manual assessment and common use of overly-simplified unidimensional criteria of quality which do not provide meaningful direction for improvement. The current study addressed this longstanding issue through the development and validation of a multi-dimensional automated assessment scheme that uses linguistic models to classify reflections by overall quality (depth) and the presence of six constituent elements denoting quality (description, analysis, feeling, perspective, evaluation, and outcome). 1500 reflections from 369 dental students were manually coded to establish ground truth. Classifiers for each of the six elements were trained and tested based on linguistic features extracted using the LIWC tool applying both single-label and multi-label classification approaches. Classifiers for depth were built both directly from linguistic features and based on the presence of the six elements. Results showed that linguistic modeling can be used to reliably detect the presence of reflection elements and depth. However, the depth classifier showed a heavy reliance on cognitive elements (description, analysis, and evaluation) rather than the other elements. These findings indicate feasibility of implementing multidimensional automated assessment in health professions education and the need to reconsider how quality of reflection is conceptualized.
Presentation 8B3 (30 min): “Towards Automatic Cross-Language Classification of Cognitive Presence in Online Discussions (Full research paper)
Keywords: Community of Inquiry Model, Cross-Language Classification, Content Analytics, Online Discussion, Optimization
This paper presents an approach to automatic cross-language classification of messages in online discussion according to the categories of cognitive presence defined in the Communities of Inquiry framework. The proposed approach uses features independent of language based on established linguistic frameworks (i.e., LIWC and Coh-Metrix). This paper also provides a new method for data balancing using state-of-the-art over- and under-sampling algorithms to improve the training step. The best performing classifier using the proposed data balancing method obtained 0.53 Cohen’s kappa, which represents an improvement of 65.62\% to the result of the method without any data balancing. Moreover, this study presents theoretical insights into the nature of cognitive presence by looking at the classification features that were most relevant for distinguishing between the different categories. This finding thus produced evidence of future directions on how to enhance automatic cross-language classification within educational setups.
Session 8C: Mathematics Learning, Location: (Room HZ 9)
Presentation 8C1 (30 min): “The Automated Grading of Student Open Responses in Mathematics (Full research paper)
Keywords: open responses, automatic grading, natural language processing
The use of computer-based systems in classrooms has provided teachers with new opportunities in delivering content to students, supplementing instruction, and assessing student knowledge and comprehension. Among the largest benefits of these systems is their ability to provide students with feedback on their work and also report student performance and progress to their teacher. While computer-based systems can automatically assess student answers to a range of question types, a limitation faced by many systems is in regard to open-ended problems. Many systems are either unable to provide support for open-ended problems, relying on the teacher to grade them manually, or avoid such question types entirely. Due to recent advancements in natural language processing methods, the automation of essay grading has made notable strides. However, much of this research has pertained to domains outside of mathematics, where the use of open-ended problems can be used by teachers to assess students’ understanding of mathematical concepts beyond what is possible on other types of problems. This research explores the viability and challenges of developing automated graders of open-ended student responses in mathematics. We further explore how the scale of available data impacts model performance. Focusing on content delivered through the ASSISTments online learning platform, we present a set of analyses pertaining to the development and evaluation of models to predict teacher-assigned grades for student open responses.
Presentation 8C2 (30 min): “Peeking through the Classroom Window : A Detailed Data-Driven Analysis on the Usage of a Curriculum Integrated Math Game in Authentic Classrooms (Full research paper)
Keywords: Curriculum Integrated Games, Educational Games, Classroom Integration, Game Play Session, Teaching Practices
We present a data-driven analysis that provides generalized in-sights of how a curriculum integrated educational math game gets used throughout the year in authentic primary school classrooms.Our study related the observations found in field study on Spatial Temporal Math (ST Math) usage to our findings mined from ST Math students’ sequential game play data. We identified features that vary across game play sessions and modeled their relationship with students’ session performance. We also derived data-informed suggestions that may provide teachers with insights into how to design classroom game play sessions to facilitate more effective learning.
Presentation 8C3 (30 min): “Data-informed Curriculum Sequences for a Curriculum-Integrated Game (Full research paper)
Keywords: Serious Game Analytics, Educational Games, Curricular Sequencing, Retries, Predictive Analysis
In this paper, we perform a novel predictive analysis of a curriculum- integrated math game, 123-MATH, to suggest a partial ordering for the game’s curriculum sequence. We analyzed the sequence of 123-MATH objectives played by elementary school students in 5 U.S. districts and grouped each objective into difficult and easy categories according to how many retries are needed for students to master an objective. We observed that retries on some objectives are high in one district and low in another district where the objec- tives are played in a different order. Motivated by this observation, we investigated what makes an effective curriculum sequence. We inferred a new partially-ordered sequence by replicating a prior study to find predictive relationships between 15 objectives played in different sequences by 3,328 students from 5 districts. Based on the predictive abilities of objectives in these districts, we found 17 suggested objective orderings. After deriving these orderings, we confirmed the validity of the order by evaluating the impact of the suggested sequence on changes in rates of retries and correspond- ing performance. We observed that when the objectives are played in the suggested sequence, we record a drastic reduction in retries, implying that these objectives are easier for students. This indicates that objectives that come earlier can provide prerequisite knowl- edge for later objectives. We believe that data-informed sequences, such as the ones we suggest, may improve efficiency of instruction and increase content learning and performance.
Session 8D: Predictive Analytics, Location: (Room HZ 11)
Presentation 8D1 (30 min): “Predicting Student Performance in Interactive Online Question Pools Using Mouse Interaction Features (Full research paper)
Keywords: student performance prediction, question pool, mouse movement trajectory, heterogeneous information network
Modeling student learning and further predicting the performance is a well-established task in online learning and is crucial to personalized education by recommending different learning resources to different students based on their needs. Interactive online question pools (e.g., educational game platforms), an important component of online education, have become increasingly popular in recent years. However, most existing work on student performance prediction targets at online learning platforms with a well-structured curriculum, predefined question order and accurate knowledge tags provided by domain experts. It remains unclear how to conduct student performance prediction in interactive online question pools without such well-organized question orders or knowledge notations by experts. In this paper, we propose a novel approach to boost student performance prediction in interactive online question pools by further considering student interaction features and the similarity between questions. Specifically, we introduce new features (e.g., think time, first attempt, and first drag) based on student mouse movement trajectories to delineate students’ problem-solving details. In addition, heterogeneous information network is applied to integrating students’ historical problem-solving information on similar questions, enhancing student performance predictions on a new question. We evaluate the proposed approach on the dataset from a real-world interactive question pool using four typical machine learning models. The result shows that our approach can achieve a much higher accuracy for student performance prediction in interactive online question pools than the traditional way of only using the statistical features (e.g., students’ historical question scores) in three out of four models. We further discuss the performance consistency of our approach across different prediction models and question classes, as well as the importance of the proposed interaction features in detail.
Presentation 8D2 (30 min): “Quantifying Data Sensitivity: Precise Demonstration of Care when Building Student Prediction Models (Full research paper)
Keywords: Prediction models, data sensitivity, data privacy, student models, data collection parsimony, ethics, empirical Bayes
Until recently an assumption within the predictive modelling community has been that collecting more student data is always better. But in reaction to recent high profile data privacy scandals, many educators, scholars, students and administrators have been questioning the ethics of such a strategy. Suggestions are growing that the minimum amount of data should be collected to aid the function for which a prediction is being made. Yet, machine learning algorithms are primarily judged on metrics derived from prediction accuracy or whether they meet probabilistic criteria for significance. They are not routinely judged on whether they utilize the minimum number of the least sensitive features, preserving what we name here as data collection parsimony. We believe the ability to assess data collection parsimony would be a valuable addition to the suite of evaluations for any prediction strategy and to that end, the following paper provides an introduction to data collection parsimony, describes a novel method for quantifying the concept using empirical Bayes estimates and then tests the metric on real world data. Both theoretical and empirical benefits and limitations of this method are discussed. We conclude that for the purpose of model building this metric is superior to others in several ways, but there are some hurdles to effective implementation.
Presentation 8D3 (20 min): “Considerations for amending a whole-institution early-alert system (Practitioner report)
Keywords: Learning analytics, Early warning, Student success
[Institution name] has had a whole-institution learning analytics based alerting system since 2014-15. These alerts were based on 14 consecutive days of non-engagement during term time, and were designed to be marked indicators of risk. Years of data has confirmed that students who received alerts were significantly less likely to progress to the next year of their studies or achieve high grades than their peers. In September 2018 a new version of the learning analytics platform was released which allowed more flexible alerting options. This paper outlines the data- and ethics-driven decision making required to agree new institutional alerting parameters.