
CAWA is ONCOSCREEN’s companion mobile application for citizens or patients. This application supports the collection, visualization, and transmission of citizens’ or patients’ data. It is based on the mobile application of the HEALTHENTIA Platform, which is a CE-marked medical device software. This means that the platform is certified for data collection, remote patient monitoring, and delivery of advice on aspects that are neither life-threatening, nor time critical.
The modification of the Healthentia mobile application into ONOCSCREEN’s CAWA involves the rebranding of the application to align with that of the project, the creation of new widgets for the collection and display of the screening information, and the integration of the mobile application and its data back into the ONCOSCREEN project. The integration of this data is managed by a “Kafka client” that orchestrates communication between the mobile app, the backend of HEALTHENTIA Platform, and the rest of the ONCOSCREEN components. Understanding how this integration works begins with understanding Kafka.
Kafka is a system designed to handle and transfer moderate amounts of data between different components of a software system. It is best used when the different components of the software system need to communicate but don’t need to do so instantly (near real-time communication). Thus, Kafka works like a message bus, where data can be picked up, stored, and delivered to the different components of the software system when they are ready to receive it. In doing so, it helps to coordinate the movement of data between the different components of the system in a reliable and scalable way.

Here is a message-in-a-bottle analogy for Kafka: A lake (broker) moves the bottles (messages) of different colors (topics) around while fishermen (clients) drop bottles (produce messages) or pick-up bottles (consume messages) . This analogy breaks down how Kafka facilitates messaging in different topics.
In the ONCOSCREEN context, the broker is a central component created in the ONCOSCREEN platform. The broker manages communication between other components that need to exchange information. As such, components exist in pairs – one sending a message and one receiving a message. Each component participating in the exchange communicates by creating a Kafka client, which is a software entity that communicates information with the broker behind the scenes. There are many such clients, with one per component. This information that is exchanged by clients, is called “Kafka messages”. These messages are organized into topics, and every pair of components communicating (one sending, the other receiving) via Kafka messages defines their own topics.
The Kafka client of the component that initiates the exchange produces a message into the topic, while the client of the receiving component consumes the message from the topic. The data exchanged in the communication is usually found in the data lake by the receiving component, after it is notified by the Kafka message.
The CAWA application is integrated with the rest of the ONCOSCREEN components via its own client that communicates with the project’s broker. The CAWA client both produces and consumes messages on behalf of the CAWA application. It produces messages when it wants to notify other ONCOSCREEN components about a new piece of information and consumes messages (produced by other ONCOSCREEN components) to be informed by them on the availability of information.
Sending information from CAWA to the centralized data repository of ONCOSCREEN, its data lake (and from there to any other component) is managed via an ONCOSCREEN component called Ingestion. The CAWA client notifies Ingestion about the availability of new data. This data is generated by the citizens or patients using the CAWA application. This data can include answers to questionnaires, data manually input using the user interface (UI) of the different CAWA widgets, or data automatically measured from devices.
The notifications of Ingestion to CAWA can be both synchronous (i.e. following a regular schedule) and asynchronous (i.e., done irregularly). A synchronous notification is used for data that is not needed as soon as entered and thus can be ingested periodically. An asynchronous notification is used for data that needs to be sent as soon as it is entered in CAWA, which includes manual entry from the CRISPR screening tool.
The messages that are produced by the CAWA client towards Ingestion do not carry the data itself. Rather, the messages inform the Ingestion component that new data exists and what types of data there are. These messages list all the data sources that need to be queried for new data. Since CAWA sends the data to the HEALTHENTIA Platform, these sources are the calls from the external Application Programming Interface (API) of HEALTHENTIA Platform that should be employed to get the new data. The Ingestion component acknowledges the progress of data storage in the data lake by producing a message to be consumed by the CAWA client.
The asynchronous information that arrives at the CAWA application comes from both the ONCOSCREEN screening tools and the risk assessment tool. Data from the screening tools is entered into CAWA in three ways: directly by a citizen or patient using CAWA’s UI (created by the ONCO-CRISPR tool); automatically by the screening tool taking the measurements (such as ONCO-VOC); or indirectly by a clinician putting the screening information (from the ONCO-CTC or ONCO-NMR tools) into ONCO-CLIDE (the UI that clinicians use, which in turn informs CAWA). The automatic and indirect methods are implemented with the CAWA client consuming the relevant messages on what to look for in the ONCOSCREEN data lake. The client then gets the data from the data lake, and once this data is made known to the CAWA application, it will be available at the biomarker widget of the application.
The second type of data being transferred to CAWA is risk assessment data. The assessment of risk is done by ONCO-RISTE and the result is transferred to CAWA in a similar way to the screening tools’ data. RISTE produces the message that notifies CAWA via its Kafka client, whereupon the client gets the risk value from the data lake.
It is evident that the use of Kafka between CAWA and many other components of ONCOSCREEN facilitates their integration by orchestrating communication between them. Kafka messages always inform on availability, while the data lake allows the centralized information flow. This way CAWA can deliver its important functionalities of both collecting the data generated by citizens or patients using the application and delivering new data to them generated elsewhere in ONCOSCREEN. The users of CAWA offer information necessary for the rest of the ONCOSCREEN components to function, while they receive back the results from these components, enhancing their understanding of colorectal cancer.