Multimodal Agentic AI for Wheat Management Enhancement

by Selena Song

Wheat is one of the major global crop and is a pillar of food systems worldwide where it makes up 20 percent of global calorie intake \cite{erenstein2022global}. Increasing food production is a key challenge that needs to be addressed for global security but with the backdrop of climate change, it is critical to take sustainable approaches in the production pipeline to increase yield .

The purpose of this project is to deliver a multi-modal wheat management assistant to help optimise wheat management decisions for increased yield. Literature has highlighted the growing usage of artificial intelligence for assistance within the general agriculture domain, and the growing potential for models capable of processing both images and natural language. Yet, despite rapid development in multi-modal models, existing models are still mainly general purpose and lack domain specialisation required to process wheat management tasks. Furthermore, it is noted that there are few existing multi-modal wheat focused datasets. These challenges are the motivation behind constructing high quality domain specific datasets and systems for wheat management.

To address the challenges highlighted above, a bespoke multi-modal dataset which contains high quality wheat images and textual prompts will first be constructed. The dataset created underpins the domain adaptation of general purpose VLMs to wheat specific tasks which will be deployed next. The model will be bench marked against various counter parts using LLM as judge method across several metrics such as accuracy. Finally, the Domain adapted model is embedded inside a multi agent framework designed to extend domain scope and scalability. A user friendly interface is developed to ensure the system is accessible for non-technical users such as farmers and agronomists.

The findings indicate that domain adapted models outperform general propose models across the specified tasks. Furthermore, multi-agent systems lead to more nuanced responses compared to domain adapted VLM responses. However, there are occasional reports of partially true information highlighting which highlights the need for an integrated database to ensure reliable and up to date information are retrieved. The system can be further enhanced by incorporating more specialised tools for the agentic system to use.