Nicholas Moratelli

I am a final-year Ph.D. candidate in Artificial Intelligence Engineering at the AImageLab research group under the supervision of Professor Rita Cucchiara.

My research focuses on advancing the field of Artificial Intelligence through the development of Large Vision Language Models, Vision-and-Language Foundation Models, and Retrieval-Augmented Generation. I've published at top conferences and journals like CVPR, ICLR, ACL, BMVC and IJCV. I recently joined Amazon Science in Cambridge, UK, as an Applied Scientist Intern.

When I'm not coding or writing papers, you can probably find me at a conference somewhere in the world, coffee in hand.

Latest News

Publications

CVPR 2025

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

F. Cocchi*, N. Moratelli*, M. Cornia, L. Baraldi, R. Cucchiara

In Conference on Computer Vision and Pattern Recognition, 2025

ICLR 2025

Causal Graphical Models for Vision-Language Compositional Understanding

F. Parascandolo, N. Moratelli, E. Sangineto, L. Baraldi, R. Cucchiara

In International Conference on Learning Representations 2025

BMVC 2024
Oral Presentation

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

N. Moratelli*, D. Caffagni*, M. Cornia, L. Baraldi, R. Cucchiara

In British Machine Vision Conference 2024

CVPR Workshops 2024

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

D. Caffagni*, F. Cocchi*, N. Moratelli*, S. Sarto*, M. Cornia, L. Baraldi, R. Cucchiara

In Conference on Computer Vision and Pattern Recognition Workshops, 2024

ACL 2024

The Revolution of Multimodal Large Language Models: A Survey

D. Caffagni*, F. Cocchi*, L. Barsellotti, N. Moratelli*, S. Sarto*, L. Baraldi*, L. Baraldi, M. Cornia, R. Cucchiara

In Findings of the Association for Computational Linguistics, 2024

ICPR 2024
Oral Presentation

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

N. Moratelli, M. Cornia, L. Baraldi, R. Cucchiara

In International Conference on Pattern Recognition, 2024

IJCV 2025

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training

S. Sarto*, N. Moratelli*, M. Cornia, L. Baraldi, R. Cucchiara

In International Journal of Computer Vision, 2025

BMVC 2025

Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization

A. Compagnoni, D. Caffagni, N. Moratelli, M. Cornia, L. Baraldi, R. Cucchiara

In British Machine Vision Conference 2025

ICCV Workshops 2025

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning

F. Cocchi*, N. Moratelli*, D. Caffagni*, S. Sarto*, M. Cornia, L. Baraldi, R. Cucchiara

In International Conference on Computer Vision, 2025

ECCV Workshops 2024

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis

D. Bucciarelli, N. Moratelli, M. Cornia, L. Baraldi, R. Cucchiara

In European Conference on Computer Vision and Pattern Recognition, 2024

Sensors 2023

Fashion-oriented image captioning with external knowledge retrieval and fully attentive gates

N. Moratelli*, M. Barraco*, D. Morelli, M. Cornia, L. Baraldi, R. Cucchiara

In Sensors MDPI, 2023

Intelligent Systems 2024

Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization

N. Moratelli, M. Barraco, M. Cornia, L. Baraldi, R. Cucchiara

In IEEE Intelligent Systems, 2024

Projects

Project Image
Multimodal LLM

LLAVA-MORE

141

A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning.

Get In Touch

Interested in collaboration?

I'm always open to discussing research ideas, potential collaborations, or opportunities to apply AI in innovative ways.

Email Me