Reading Note: "Safety at Scale: A Comprehensive Survey of Large Model Safety"

Ma et al. "Safety at Scale: A Comprehensive Survey of Large Model Safety". arXiv preprint arXiv:2502.05206 (2025).

Intro

Range: Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-based Agents.

Contributions:

  • Proposing a comprehensive taxonomy (10'): Adversarial, data poisoning, backdoor, jailbreak, prompt injection, energy-latency, membership inference, model extraction, data extraction, and agent-specific attacks.

  • Reviewing defense strategies and summarizing commonly used datasets and benchmarks.

  • Identifying and discussing open challenges: Comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices.

The left part illustrates the quarterly trend in the number of safety research papers published across different models (in total 20, 24, 123, 223 papers from 2021-2024).

Last updated

Was this helpful?