Here you can find different artifacts produced in the context of the projects Text2HBM and BehavE.

Data Annotation Tutorial

Summary: during the UbiComp / ISWC 2023 conference our group presented a tutorial on data annotation and more specific – how to create a data annotation protocol. During the event we explained different techniques for data annotation and different methodologies to collect ground truth data. Additionally we discussed different sources of bias in annotation tasks, demonstrated how to measure agreement between annotators and discussed the best practices for creating annotation guidelines. We also included recent topics such as annotation using Large Language Models (LLMs) by going through several case studies from the literature. The participants also performed annotations using some of the state-of-the-art tools for data annotations.

All materials from the tutorial could be found on github.

eDEM-Connect Ontology

Summary: the eDEM-CONNECT Ontology incorporates domain knowledge about the various types of agitated behaviour observed in people with dementia (PwD), as well as the dyadic relationship existing between PwD and their informal caregivers. Furthermore, it offers a structured framework for non-pharmacological interventions, enabling caregivers to effectively manage and mitigate the bidirectional effects of agitation in PwD. The ontology has been jointly developed by researchers in the BehavE and eDEM-Connect projects. Among others, it has been used as a codebook for named entity recognition tasks on informal dementia-related textual data, as well as a knowledge base for a situation-aware chatbot. In the context of the BehavE project, it has been used as a domain-specific knowledge base for generating domain models.

The ontology is publicly available on BioPortal.

A paper, describing how it has been used as a codebook for annotation, could be found here.

A paper, describing how it has been used as a knowledge base for a chatbot, could be found here.

Semantic Annotation for the KTA Dataset

Summary: As we encountered problems in the annotation of the KTA dataset (described below), we re-annotated and semantically validated the new annotation. The annotation provides information about actions, objects, and locations and is validated to be causally correct. The annotation is available on github. Together with the annotation, the data used for activity recognition on the dataset is also available in the same repository.

Here you can find the benchmark paper investigating different classification algorithms for activity recognition on the dataset.

Textual Descriptions Dataset

Summary: Recent research in behavior understanding through language grounding has shown it is possible to automatically generate behaviour models from textual instructions. These models usually have goal-oriented structure and are modelled with different formalisms from the planning domain such as the Planning Domain Definition Language. One major problem that still remains is that there are no benchmark datasets for comparing the different model generation approaches, as each approach is usually evaluated on domain-specific application. To allow the objective comparison of different methods for model generation from textual instructions, in this report we introduce a dataset consisting of 83 textual instructions in English language, their refinement in a more structured form as well as manually developed plans for each of the instructions. The dataset is publicly available to the community.

Here you can find the link to the technical report: Towards Evaluating Plan Generation Approaches with Instructional Texts

Here you can find the dataset, which is publicly available.

Kitchen Task Assessment Dataset

Summary: With the demographic change towards ageing population, the number of people suffering from neurodegenerative diseases such as dementia increases. As the ratio between young and elderly population changes towards the seniors, it becomes important to develop intelligent technologies for supporting the elderly in their everyday activities. Such intelligent technologies usually rely on training data in order to learn models for recognising problematic behaviour. One problem these systems face is that there are not many datasets containing training data for people with dementia. What is more, many of the existing datasets are not publicly available due to privacy concerns. To address the above problems, we present a sensor dataset for the kitchen task assessment containing normal and erroneous behaviour due to dementia. The dataset is recorded by actors, who follow instructions describing normal and erroneous behaviour caused by the progression of dementia. Furthermore, we present a semantic annotation scheme which allows reasoning not only about the observed behaviour but also about the causes of the errors.

Here you can find the link to the sensor dataset, including sensor data and semantic annotation.

The video data is available on request. If you are interested in the video data, please, contact kristina.yordanova (at) or peter.eschholz (at)

A paper describing the dataset and first evaluation results has been accepted for publication in the PerCom Workshop Proceedings.

The dataset and the first evaluation results have been certified in the PerCom Artifacts Track.

Semantic Annotation for the CMU-MMAC Dataset

Summary: Providing ground truth is essential for activity recognition for three reasons: to apply methods of supervised learning, to provide context information for knowledge-based methods, and to quantify the recognition performance. Semantic annotation extends simple symbolic labelling by assigning semantic meaning to the label, enabling further reasoning. We create semantic annotation for three of the five sub datasets in the CMU grand challenge dataset, which is often cited but, due to missing and incomplete annotation, almost never used. The CMU-MMAC consists of five sub datasets (Brownie, Sandwich, Eggs, Salad, Pizza). Each of them contains recorded sensor data from one food preparation task. The dataset contains data from 55 subjects, were each of them participates in several sub experiments. While executing the assigned task, the subjects were recorded with five cameras and multiple sensors.

The produced annotation is publicly available, to enable further usage of the CMU grand challenge dataset. The annotation of three of the five datasets (Brownie, Sandwich, and Eggs) can be downloaded here.

A workshop paper describing the annotation process can be found here and an extended journal paper can be found here.