Keywords

1 Introduction

Team collaboration and meetings are an essential part of any software development organisation, but they are challenging to organise and manage. Often, meetings do not follow guidelines or run into issues that affect their efficiency and productivity, or delay decision-making [12]. Sometimes, the guidelines themselves burden the development teams and need to be adapted. The Post-Rolling Refinement Model (PRIME), an in-house designed Scaled Agile framework proposed by the Austrian Post [14], aims to reduce this burden.

Microsoft sparked in 2019 the debate on how Artificial Intelligence (AI) could further improve meetings by automating tasks and retrieving information before, during, and after them [10]. That could, among other benefits, improve the flow, save time, increase productivity, or reduce frustration during said meetings [10]. There is a growing body of white and grey literature that recognises the role that AI could have in reducing the organisational burden on the participants, ensuring that meetings are conducted in a more organised and structured manner, or providing insights to improve future meetings. However, there has been little academic attention paid to the role of AI interventions in software development meetings. And, to the best of the authors’ knowledge, no research has been conducted to investigate the support of an AI meeting assistant and its interactions with practitioners in a systematic way.

We are adopting action research to explore the use of AI in meetings at Austrian Post Group IT, an international postal, logistics and service provider described in detail in Sect. 3. We focus on two of their regular meetings: standard Daily Scrums and feature refinement and planning meetings, part of the PRIME framework [14]. The study explores how the use of AI affects the practitioners’ meeting experience. By doing so we aim to answer three questions:

RQ1:

How can AI assist in identifying potential problems and risks in Agile meetings, and provide actionable and useful recommendations?

RQ2:

To what extent do the AI meeting-assistants generate sensible recommendations in the context of real meetings in real time?

RQ3:

How do users perceive the AI meeting-assistants in terms of user experience, and what impact does it have on overall performance?

Section 2 provides a brief overview of the relevant academic and grey literature on AI for software development tasks. Then, the paper moves on to detail the methodology used in this paper in Sect. 3. Section 4 analyses the interventions and evaluation surveys undertaken during this study, and reasons about the design decisions that lead to the final AI meeting-assistants. Finally, Sect. 5 points out the lessons learnt and how they could be transferred to similar contexts, and Sect. 6 highlights important implications for future practice.

2 Related Work

2.1 Adapting Agile Practices to Companies

Many organisations have already shifted from traditional working processes to Agile. Nevertheless, not all agile teams follow guidelines thoroughly [7, 12]. This is because they might feel that following all Scrum rules is too time-consuming or that some of the Scrum rules are irrelevant or even outweighing the benefits [14].

Mortada et al. discuss several wrong practices in Scrum teams. For example, daily Scrum meeting is supposed to last only 15 min, but it often runs longer than that [12]. Other examples include not estimating user stories, not having the correct structure for writing the user stories in the backlog, not having a product backlog, not defining the sprint goal at the beginning of the sprint and not ending the sprint with demonstrating the desired deliverable [12].

Similar problems might also be present in Scaled Agile setups. Previous research has highlighted the role of confusion about roles and responsibilities in creating of unnecessary overhead [14]. For example, senior engineers and architects, might want to get more time to understand technical details, which requires separate smaller feature estimation meetings [14]. At the same time, developers often do not receive feedback from the right stakeholders at the end of the sprint [12].

For these reasons, some companies have proposed modifications and ad-hoc adaptations to the guidelines. For instance, Austrian Post proposes the Post-Rolling Refinement Model (PRIME) [14]. PRIME aims to remove a lot of the bureaucratic overhead, pushes decisions to the Scrum teams, and reduces the number of meetings and the number of mandatory participants [14].

2.2 Generative AI in Software Engineering

Generative AI (GenAI) is a type of AI that can generate different types of content, such as images, text, audio, and 3D models, based on the input it receives. Large Language Models (LLMs), such as GPT-4, are currently being used in a wide range of fields, including medicine, economics, software development, academia, and business. GenAI can also support software engineering [13]. Managers could use GenAI tools to get recommendations for decision-making, or to automate some interactions with the customers, e.g., using virtual assistants integrated with customer service tools etc [15]. Organisation’s data could also be used as input to the AI tools for the purposes of creating tables, analysing statistics, generating models, and monitoring workflows [10].

Microsoft breaks the interventions of AI in before, during, and after team collaboration sessions and meetings and provides insights into how organisations can use AI to retrieve or generate relevant information and resources [10]. However, there are challenges associated with integrating AI and Agile methodologies, such as the need for specialised technical expertise, and integrating AI into Agile software development processes requires careful consideration of the context [6]. Moreover, the most critical challenges are related to human factors, e.g., ensuring that developers have the skills to work with AI and align expectations effectively [4, 8]. As a result, the trade-off between creativity, human oversight, and cyber-security is a critical factor to consider.

2.3 Prompt Engineering

Prompt engineering is defined as a set of techniques to improve the inputs or instructions that a user provides for an AI model to get desired outputs [13]. Depending on the context that the AI model is used for, designing appropriate prompts is very important to get more accurate results, therefore it is suggested to narrow down the prompts and avoid using too general queries [5]. Mastropaolo et al. also studied the influence of varying natural language descriptions for Copilot prompts and proved that paraphrasing leads to different quality levels for the generations [9]. There are different techniques for prompt engineering including role-prompting, user of triple quotes to separate, trying several times with generating responses, etc [2].

It is known that it is difficult to balance relevant information retrieval (in this case, generating recommendations) and not overloading the participants [1]. These two things can be balanced in the prompt.

3 Research Design

The study aims to understand how practitioners use and perceive the use of AI in meetings, a relatively new phenomenon [10]. We adopted action research to observe both technical and social aspects of AI usage through interventions in Austrian Post Group IT’s online meetings. The emphasis of this study is not the artefacts produced (provided as Supplementary Materials), but rather the motivations behind design choices, detailed in Sect. 4. Before that, Sect. 3.1 provides some context, Sect. 3.2 describes the expectations for both assistants, and Sect. 3.3 explains how surveys and observations were used to draw conclusions and improve the artefact.

3.1 Context

Austrian Post is an international postal, logistics and service provider operating in the markets of Austria, where it plays a critical role in the country’s infrastructure, eight other countries in Central and Eastern Europe, and Turkey. The development teams, which are part of Group IT use broadly accepted frameworks, such as Scrum as well as a in-house designed Scaled Agile framework named PRIME [14]. More than 520 employees work at Austrian Post Group IT, out of which approximately 300 have a similar role as the participants, that belong to 3 out of the 9 development teams at the Digital Logistics Platform.

An action team, consisting of both practitioners and researchers, was named to be responsible for planning, executing, and evaluating the research. The selected practitioners were involved in planning and executing actions, besides observing and providing contextualised feedback after each of the interventions. Moreover, they provided the evaluation of the final outcome with their deep knowledge of the context, the Scrum and PRIME frameworks, and the practitioners’ way of working.

Complementary, a reference group of practitioners, responsible to give advice and feedback to the action team, and a management team, who is planned to govern the institutionalisation of the proposed changes, were also key to conduct the present study. The goals of these two groups is to evaluate the benefits of AI meeting assistants, as reported in this study, and exploring potential directions for further work, aligned with Austrian Post’s strategic goals, e.g., automating repetitive preparation tasks, supporting less experienced developers, creating useful summaries for those that could not attend a meeting, etc.

3.2 Action

We propose two AI assistants to use before, during and after two Agile software development meetings, as shown in Table 1. The assistants are instanced by prompting Azure OpenAI Studio’s GPT-4 LLM. The prompts were designed iteratively: first listing the current challenges of each meeting, then refining the prompts and testing them without sharing the generations (silent demos), and finally using the AI assistants with participants, under observation.

The Agile Release Train Coach Assistant. An AI assistant was designed, with the help of reference team, to help the Agile Release Train Coach (a servant leader to the train and support teams in delivering value) prepare and conduct PRIME meetings by helping refine and plan the next PRIME iteration [14].

Table 1. AI meeting assistant actions

The Agile Release Train Coach assistant is instantiated using a prompt and three spreadsheet files. The files contain information about (i) the features and related children User Stories in the PRIME feature board, (ii) the average velocity of development teams per sprint, and (iii) of the Agile Release Train [14]. These files need to be provided to the assistant given that no real-time connection to Azure DevOps is yet available for the LLM. The files are automatically embedded by the Azure AI search platform in order to be accessed by the LLM [11]. Using the embedded files, the assistant provides valuable insights to:

  1. 1.

    Limit the risk of teams over-committing (using team’s capacity and velocity).

  2. 2.

    Identify features with no effort value.

  3. 3.

    Identify features placed in incorrect backlog (based on iteration path).

  4. 4.

    Check unplanned integration testing efforts.

  5. 5.

    Identify features where there is a children user story for a team that is not tagged in the feature.

  6. 6.

    Help plan large features (based on effort points).

  7. 7.

    Highlight non-estimated and incorrectly estimated features.

  8. 8.

    Limit the risk of the Agile release Train over-committing (based on capacity).

The information contained in the files was manually gathered from Azure DevOps and anonymised by the reference team in two-hour-long sessions before each of the PRIME meetings described in Sect. 2. In these sessions, the prompt to instantiate the Agile Release Train Coach assistant was improved and the validity of its insights, checked. It is important to note that not only was the AI assistant faster than the reference team at analysing the data, but also made less mistakes than humans. Moreover, as the prompt design improved, as discussed in Sects. 4, the time to generate and check the insights went down to 30 min.

Scrum Team Assistant Tool for the Daily Scrum. Based on the insights of the action and reference team, a second assistant was designed to get real-time recommendations on the meeting progress, and insights on the adherence to the official Scrum guidelines right after the Daily Scrum.

The Scrum Team Assistant Tool assistant is instantiated using a prompt and the latest version of the official Scrum Guide, since there is a risk of the LLM retrieving a deprecated version. The prompt gives the general context of the intervention and instructs the LLM on how to act depending on the user inputs. To generate manageable, to-the-point recommendations and align with the expectations of practitioners, the assistant is asked to generate up to 10 words that are “friendly.” The prompt concludes with general instructions for the LLM to run the assistant. Once instantiated using the prompt in Supplementary Materials, the AI assistant generates different recommendations when:

  • An individual talks about topics not related to the team’s work.

  • The work that the individual mentions is not visualised in the sprint backlog.

  • An individual engaged in a detailed discussion about a specific topic.

  • The impediment that an individual raised is not visualised.

  • An individual was interrupted by an external circumstance.

  • An individual does not have any task in “updated state” in the backlog.

  • An individual needs to create a ticket to a specific team.

  • Any other problem that may occur during the Daily Scrum.

The real-time generated recommendations are shared with the team using disappearing pop-up messages, as represented in Fig. 1. Right after the meeting, as shown in Table 1, the assistant provides a summary of the problems the team has faced during their Daily Scrum, lists all created tickets, and shares tips how to improve the next Daily Scrum Meeting via the meeting chat.

Fig. 1.
figure 1

AI recommendations, shown using OBS, during an Daily Scrum meeting.

3.3 Data Collection and Analysis

The observations, surveys, and discussions held with practitioners helped capture the nuances of the practitioners’ opinions and provide rich data for analysis.

Surveys to Understand Team Composition. Previous to the interventions, we sent a survey to the Scrum teams in the Digital Logistics Platform to capture the practitioners’ opinions on AI. The participants’ responses (a total of 39) helped us to select three teams willing to participate in the interventions. The three selected teams, as shown in Fig. 2, had a similar composition in terms of their feelings about AI meeting assistants. The readers who want to replicate the study can use the survey questions, provided as Supplementary Materials.

Fig. 2.
figure 2

Team composition, based on their answers to the initial survey question “How would you feel about using AI support in the meeting?”

Observations Before and During the Actions. The researchers in the action team participated on the online meetings over a period of time to get accustomed and to understand their routines and needs [16]. During the observations of the meetings, the researchers did not intervene, even though the meeting attendants were aware of their presence. In each of the interventions and silent demos for the two assistants, notes were taken by the action team about the AI-practitioner interaction. Being part of the environment also helped interpret the true meaning of the answers obtained before and after each of the interventions [16].

How the Conclusions Were Drawn. The survey answers, together with the feedback by the practitioners after the interventions, and the notes from the observations were used to improve the design of the prompts, as discussed in Sect. 4.

4 Execution and Results

This section first presents the specific challenges identified during the initial observations, previous to the interventions, and suggests how AI could assist practitioners (RQ1). The participants views, collected after each intervention, helped improve the AI assistants in helping reach the meeting goals and adhere to the guidelines (RQ2). Finally, this section moves on to discuss how the practitioners felt about the AI assistants and the implications on their way of working (RQ3). Themes emerging from the participants’ responses are also highlighted here and their transferability to other contexts is discussed in Sect. 5.

4.1 Preparations with the Agile Release Train Coach Assistant

At the beginning of each PRIME iteration, information about the features for the next PRIME needs to be prepared to allow for discussions during the meeting. Doing this is, according to one of the practitioners, a “boring process” that takes considerable time from a number of people. As a result, and similarly to the three groups represented in Fig. 2, most of the participants on PRIME interventions (76%) also reported feeling excited or curious about testing AI assistants.

To help with preparations, the Agile Release Train Coach assistant provides insights meant to help the work, as reported in Sect. 3.2. These insights are described to the LLM in the prompt, provided as Supplementary Material, that was iteratively designed in the three action iterations in Table 2.

Table 2. Record of all notable changes made to the Agile Release Train Coach assistant

In the first intervention, the pre-generated insights were tabulated and shared with the practitioners, who required many clarifications on the values in the tables (e.g., how were effort points used to calculate the average team velocity), but overall agreed with the usefulness of the Agile Release Train Coach assistant. In this first try, however, a third of the generations were mistaken due to an outdated input data point due to a human mistake. When the errors were spotted, a participant reminded their team that “these things were done manually before” and called the AI “very handy” even if it makes minor mistakes.

Right after the meeting, participants were given the chance to provide feedback through an anonymous online survey and reported some potential misinterpretations of the AI generations. For instance, one participant stated that “a team might over-estimate but the train might still be under the [effort] threshold” and another requested clarifying the relationship between “teams’ capacity, Train velocity, and user story point estimation.” Their feedback, together with comments and notes taken during the intervention, lead to changes in the prompt. First, “risks” were renamed to “benefits” to align with the goal: helping practitioners by informing their discussions, rather than unilaterally sending warnings. Moreover, the mathematical operations were rephrased and clarified, and two benefits were added following the recommendations from the Lead Product Manager, responsible for the products of the train and owner of the train backlog [14], on how to identify features that have unrealistic estimation. Finally, the prompt was modified to generate slide templates.

Table 3. Record of all notable changes made to the Scrum Team Assistant Tool

During the second intervention, the slides created with the help of the Agile Release Train Coach assistant were used to present the insights described in Sect. 3.2. Again, some practitioners questioned the calculations and logic behind them, e.g., whether “only features with estimated efforts are used.” Another practitioner, after hearing the details about how potential over-commitment is computed by the assistant, asked: “does it mean that we have to change our way of working?” These comments led us to further refine the definitions within the prompt in order to have clearer AI generations for the questioned benefits.

Some problems arose during the second intervention regarding the mathematical computations due to an inconsistency in the anonymisation processes: there was a character mismatch in the teams’ names between the spreadsheets that are used alongside the prompt. As discussed in Sect. 5, the used LLM has troubles with mathematical operations, and therefore they need to be defined in a very precise way. Due to this, further reformulation of the benefits was done to reach the final, unambiguous version of the prompt.

In the third and last intervention, all insights were well received and a single negative comment arose: the data was a couple of hours old. Given the LLM’s performance limitations and the lack of connection to Azure DevOps data, the data needed to be prepared the morning before the intervention. Throughout all the interventions, participants repeatedly pointed out the potential of “looking at [the recommendations] live.” However, having the Agile Release Train Coach assistant working with real-time data is outside of the scope of this study and left for future work, as discussed in Sect. 5.

4.2 Real-Time Assistance by the Scrum Team Assistant Tool

We also propose, as presented in Table 1, AI assistance during and after Daily Scrum meetings. To prepare the interventions, the action team observed the practitioners in their online meetings prior to the action taking. These observations, together with the insights by the reference team, helped design the recommendations listed in Sect. 3.2. With this information, the Scrum Team Assistant Tool was designed to help participants follow the official Scrum guidelines.

In order to refine the prompt used to set up the Scrum Team Assistant Tool, three teams help us perform four design iterations, as reflected in Table 3. The composition of the three teams is similar to Team 1 when it comes to pre-conceptions about AI, as can be see in Fig. 2. Even though team members reported differences in their preferred way of working, we treat the feedback received from each team as applicable to others.

In the first intervention, the AI recommendations were shared via the Microsoft Team’s chat and the members of Team 1 found the amount of messages overwhelming and “rather distracting.” The first message caught the participants’ attention, however, they did not seem to mind the subsequent warnings: one participant even stated they were just “random messages,” and another complained that they were “warnings, warnings, and more warnings!”

In the survey, sent after the meeting, participants reported that the generations were aggressive and “missing empathy,” and 2 out of the 7 participants reported feeling annoyed by the AI. Even so, more than half of the respondents (4 out of 7) reported liking being warned when the team engaged in too detailed discussions and being notified when the Scrum Team Assistant Tool estimated that the Daily Scrum would take longer than 15 min. It is important to note that, in the survey previous to the interventions, only 24% of participants said their team does not usually finish their Daily Scrum within the 15-minute time-box. This contradicts the findings by Mortada et al., that reported 53% of Daily Scrum events not finishing within 15 min [12].

Using the feedback, the prompt was changed to generate shorter messages and not to produce “warnings” but “recommendations,” which proved to change the tone of the generations (i.e., friendlier, tactful). After the first intervention, we also improved the interface to make it less distracting: we introduced disappearing pop-up messages, as seen in Fig. 1. The new prompt and interface for the Scrum Team Assistant Tool were tested with Teams 2 and 3 in the subsequent interventions, and were well received. After seeing the changes, Team 1 was also more willing to have an AI assistant than after the first intervention.

Before the second intervention, the prompt was extended to generate a summary of the recommendations at the end. All prompt improvements were tested with Team 2 and the participants overall liked the experience and said the AI recommendations were non-intrusive. No problems were observed during the meeting or reported afterwards, however one of the practitioners reported minimising the view with pop-up messages (so they were not readable).

The third intervention, with Team 3, was similarly successful except for two specific issues. On the one hand, a participant stated that some of the generated recommendations about “keeping the discussion on the topic felt partially incorrect.” This was interpreted, with the help of the reference team, as a team-level preference; while the LLM makes recommendations to strictly follow the official Scrum guidelines, different teams had preferences as to what to allow in their Daily Scrum meetings (e.g., Team 1 accepts social related talk, as long as it stays within 15 min). On the other hand, a participant suggested that the “Scrum Master should keep an eye on these things” and have a final say on what AI recommendations are shared with the team.

In the last intervention, the participants agreed that the Scrum Team Assistant Tool “provided helpful live messages, both positive and negative.” However, similar to a participant in Team 3 that reported “feeling observed during the meeting,” one of the participants explained that “it feels unnatural” to have “something inhuman forcing [...] a specific pattern on us.” In general, across interventions, the participants appreciated the summary of the recommendations received at the end, and participants stated they “would like to have this summary for all other meetings” and suggested it would be helpful to expand it with “what was done well and if it has improved since last time.” These comments and their implications for future work are further discussed in Sect. 5.

5 Discussion of the Implications

Several reports have shown the potential of using AI to enhance various Software Engineering tasks. As mentioned in Sect. 2, prior studies that have noted the importance of appropriate interfaces and human oversight when integrating AI in Software Engineering [3]. However, very little was found in the literature on the question of how to design AI assistants for meetings and integrate them in real industrial setups. The lessons learnt in this study are discussed below, and how they could be transferred to other contexts is discussed in Sect. 5.1.

RQ1 sought to determine how to create AI assistants to help identify and address potential problems and risks in agile meetings, and suggest improvements. By observing different teams during Daily Scrums and PRIME meetings [14], challenging areas where AI could assist human practitioners were identified. Then, different interventions, in Tables 2 and 3, were used to design the Agile Release Train Coach assistant and the Scrum Team Assistant Tool.

figure a

RQ2 focused on the design process leading to sensible AI recommendations to use before, during and after the meetings. Therefore, a number of design iterations, described in Sect. 4, were used to determine the effect of different prompting strategies in the performance of the LLM in doing so. Overall, both assistants are able to generate accurate and contextualised insights, surpassing the expectations of some of the participants.

figure b

RQ3 focused on the social aspect of AI meeting assistants. In the first iterations, as described in Sect. 4, the proposed action was not well received and multiple iterations were needed to design balanced solutions. The results are in agreement with recent studies that highlight the important of appropriate interaction strategies in promoting trust in AI tools [3, 4]. It is interesting to note, though, how the participants’ perception of AI depended on how seamlessly it was integrated in the meetings, and its insistence and intrusiveness negatively impacted the participants’ perception of them: from useful to imposing.

figure c

5.1 Recommendations for Company and Teams: Readiness Assessment

The emphasis of this study is not on the artefacts produced but rather on the procedural aspects of utilising these tools and the transfer of the lessons learnt to other industrial settings. This section moves on to present a set of actionable recommendations for other companies for how to apply the results presented here, and how to conduct similar studies in the future. These insights are gathered in the Readiness Assessment of Human-AI Collaboration for Agile Meetings form, provided as Supplementary Materials, to guide interested companies and teams.

Customised Team-AI Interactions: The findings of this study highlight the need to adapt the AI assistants to each team, which needs to be studied prior to considering integration. Data about the teams’ needs, expectations, and preferred Agile practices should be gathered to customise the AI assistants. This technology, however, should be imposed neither on teams nor on team members. Therefore, the practitioners’ feelings on AI must also be assessed beforehand: if there is opposition, receiving LLM-specific training might help. Still, after integration, feedback should keep being gathered, as presented in this study.

Teams looking to integrate AI assistants to enhance Agile team collaboration sessions, should assess their readiness using these questions:

\(\square \):

Have you assessed what your team’s challenges are regarding agile meetings?

\(\square \):

Have you evaluated the team-specific agile practices and adoption maturity?

\(\square \):

Have you gathered data about practitioners’s feelings on AI-assistants?

\(\square \):

Does the team have the knowledge to integrate AI assistants(s) in meetings?

\(\square \):

Does your team see any benefits if integrating AI assistant(s) into meetings?

\(\square \):

Can the team to provide iterative feedback to improve the AI-assistant?

AI Assistant Design: During the design phase, a difficult question arose: whether AI assistants should be a support for everyone in the team or for a specific role, only. Companies (or teams) looking to integrate AI assistants need to reflect on what their goals are and design them appropriately. The authors’ recommendation is to not attempt to cover all possible functionalities; but rather use different modules, or agents, that connect only when needed. Moreover, each of the modules should provide input and pointers to the practitioners without contradicting Agile values. Once integrated, the AI assistant’s design should be rethought and improved based on the periodically-received feedback. Companies (or teams) should assess their early AI assistant design using these questions:

\(\square \):

Does the design of the AI-assistant contradict with any agile principles?

\(\square \):

Is the AI-assistant designed to support the Agile team or just a specific role?

\(\square \):

Is the AI-assistant designed to enhance or substitute a practitioner’s role?

\(\square \):

Is the AI-assistant design modular and scalable?

Investments and Compliance: Before AI assistant integration, companies should consider whether the required expertise is available and whether the investment can be made. This includes economical resources but also reflecting on the long-term impact of using these technology (e.g., on societal and environmental well-being [3]). Then, data protection and security concerns must be addressed (e.g., GDPR, user permissions, etc.). For this, as discussed in Sect. 4, we recommend ensuring that the LLM has secure access to up-to-date data (e.g., deprecated versions of online documents were often retrieved during our tests). Companies should therefore assess their readiness by asking:

\(\square \):

Does the company have the expertise to integrate AI-assistants?

\(\square \):

Is the company willing to invest resources into creating AI-assistants?

\(\square \):

Is the integration of the AI-assistants compliant with the company’s code of conduct, and ethics and sustainability strategies?

\(\square \):

Does the company have available interfaces to connect AI-assistants to other internal systems, and guarantee the retrieval of up-to-date data?

\(\square \):

Can AI-assistants be configured to only use reliable external data sources?

\(\square \):

Is the integration of AI-assistants compliant with the company’s AI-strategy, security standards, and privacy policies?

5.2 Threats to Validity

We must remind the reader that the claims are based on the actions conducted with the help of willing participants. The selection of the subjects, based on their willingness to explore AI as an assistant in meetings, might have introduced a bias. Moreover, evaluation apprehension may have caused these subjects to behave differently when observed, and feel more inclined to positively evaluate the AI assistants and pay more attention to the adherence to the Scrum framework under observation. Moreover, the meetings were conducted in English to accommodate for the action team. However, the practitioners often showed more fluency when discussing issues in their native language. This might caused delays or affected the normal conduction of the meetings.

The generalisability of this work might also be affected by the diverse data protection policies and available resources at each industrial setup. For instance, this study focuses on Azure OpenAI Services: alternatives have not been explored because of privacy concerns and case company constraints. Moreover, the provided prompts might not directly generalise to other setups for different reasons. However, the emphasis of this study is not on the prompts themselves, but rather the motivations behind design choices, and the process to gather feedback after each iteration. It is also worth noting that some events (e.g., popularisation of Microsoft Copilot) during the time that this study was conducted might have affected both the practitioners’ and the management team’s perception of the proposed AI-meeting assistants. Similar and unforeseeable events, might continue to shape their opinion on these promising tools.

6 Conclusion

The main goal of this study was to determine whether AI can assist practitioners in Agile meetings by identifying potential problems and risks. Moreover, the usefulness of the generated insights, and how they were perceived by the participants in our action research interventions, was also studied.

While the results of the paper, discussed in Sect. 4, demonstrate the usefulness of AI assistants in generating useful recommendations in real meetings, they also highlight the need to pay close attention to the user experience of practitioners that directly interact with AI. Although the current study is based on a relatively limited sample of participants, all from Austrian Post, this work offers valuable insights into the expectations of participants on AI assistance and on the strategies to integrate AI assistants before, during, and after meetings.

This study provides the first comprehensive assessment of how AI meeting assistants can be integrated in a real Agile setting, taking into account the perceptions of the practitioners in the design process. The findings, the authors hope, will be useful to guide the integration of AI into different team collaboration sessions and meetings in different setups.

6.1 Future Work

Further research might explore the adaptation of the AI assistants to specific needs of the teams they will assist. For instance, teams might have preferences when it comes to adherence to official Scrum guidelines or be at different stages of Agile adoption, in which case AI could help Scrum Masters specifically. AI could also suggest ad-hoc best practices or solutions for specific issues within a team. Moreover, the multi-modal capabilities of newer models in order to visualise information and inform practitioners more efficiently could be explored in the future, as practitioners seem to prefer graphical representations.

Similarly, further studies can be carried out to understand how AI can be integrated into other meetings, standard or ad-hoc, in other industrial setups. For instance, the reference team believes that practitioners would benefit from AI assistants in longer, resource intensive meetings, planning meetings, or review meetings, where the official guidelines might not suit some teams well. Moreover, AI could provide overviews across meetings and teams to help practitioners have an overview of the team, train, and whole development progress.

Future work could also focus, as suggested by multiple participants in this study, on generating meeting summaries for different purposes, such as informing missing participants about what transpired in a meeting. However, in order to provide relevant summaries and recommendations, the LLMs would need context such as automatic meeting transcripts and access to Azure DevOps. This data would make the recommendations more relevant and not outdated; however, further development, outside the scope of this paper, is required.

Finally, we believe a thorough assessment checklist should be crafted for companies and teams to understand their readiness to integrate AI-assistants in their Agile meetings. Then, and only then, should human-AI collaboration start.