Landslide Image Analysis and Disaster Risk Assessment Using Multimodal AI
In recent years, climate change has increased the frequency of natural disasters worldwide. Among them, landslides are particularly hazardous because they drastically alter terrain, raising the risk of secondary disasters. This makes rapid and high-level assessment crucial. However, on-site evaluations typically require expert knowledge, and the number of professionals available for such tasks is limited. To address this issue, the present study proposes a multimodal AI system that combines large language models (LLMs) with image recognition technology to analyze landslide disaster scenes with expert-level precision, automatically generating insights from images.

What sets this research apart is its ambition not only to classify images but also to output detailed descriptions, infer causes, and predict future risks—all in natural language. To achieve this, two different architectural approaches were developed. The first is the “VQA-LLM hybrid,” which uses a Visual Question Answering (VQA) model to extract key observations from an image and then passes that information to a large language model (Alpaca-13B) for comprehensive analysis. The second is an “MLLM (Multimodal Large Language Model)” that processes both visual and textual inputs simultaneously using an end-to-end model. This system integrates a CLIP-based image encoder (ViT-L/14) with LLaMA2-13B-Chat for direct output of natural language explanations.

The data for training was collected from aerial images of 68 landslide sites across Japan. Eight domain experts, each with over 30 years of field experience, verbally explained the contents of each image without relying on contextual knowledge outside the visuals. Their commentary was transcribed using a speech recognition system and translated into English summaries using GPT-3.5. GPT-4 was then employed to reformat these summaries into a unified template consisting of four fields: Disaster Type, Cause, Observations, and Future Risk. This structured format made the data suitable for AI training. To address data scarcity, paraphrasing was used as a data augmentation method, effectively doubling the dataset to 136 samples.
Performance evaluation employed both traditional metrics (BLEU, ROUGE, METEOR, SPICE) and a GPT-4-based semantic similarity assessment. Additionally, domain experts with an average of 25 years of experience conducted a blind evaluation of selected outputs. The results showed that the MLLM approach performed better in identifying disaster types and observations, while the VQA-LLM hybrid outperformed in predicting future risks. This is likely because the VQA-LLM architecture provides a structured intermediate representation, aiding reasoning in complex assessments.

Because the MLLM can learn directly from visual-text pairs, it showed strong adaptability and improved performance when trained with a larger model (13B) and an augmented dataset. These findings suggest that even in data-scarce settings, effective knowledge transfer is achievable with the right architecture and training methods. However, challenges remain: the dataset is limited to Japanese case studies, and scaling it for global applicability will require collaboration with international organizations like USGS and NASA. Future work may also involve integrating this system with other AI agents, such as weather forecasting and evacuation planning tools, to build a comprehensive disaster response ecosystem.
This research represents a significant step toward digitizing expert knowledge and enabling automated systems to provide high-quality assessments in disaster situations. It demonstrates that with thoughtful system design and structured knowledge processing, AI can perform expert-level reasoning, even with limited training data. For students interested in AI, this study offers a powerful example of how emerging technologies can be applied to real-world problems like disaster management. As AI continues to evolve, its fusion with civil infrastructure holds immense potential, and this research lays important groundwork for that future.
Publication
Areerob, Kittitouch, et al. “Multimodal artificial intelligence approaches using large language models for expert‐level landslide image analysis.” Computer‐Aided Civil and Infrastructure Engineering (2025).
@article{areerob2025multimodal,
title={Multimodal artificial intelligence approaches using large language models for expert-level landslide image analysis},
author={Areerob, Kittitouch and Nguyen, Van-Quang and Li, Xianfeng and Inadomi, Shogo and Shimada, Toru and Kanasaki, Hiroyuki and Wang, Zhijie and Suganuma, Masanori and Nagatani, Keiji and Chun, Pang-jo and others},
journal={Computer-Aided Civil and Infrastructure Engineering},
year={2025},
publisher={Wiley Online Library}
}