OmniTrust: A Platform for Trustworthiness Evaluation of Generative Foundation Models

Introduction

"Ignoring AI safety could lead to human extinction. The risks posed by AI are as urgent as climate change, if not more."

– Paris Summit in February 2025, 2018 Turing Award laureate, Geoffrey Hinton

        OmniTrust is a comprehensive security platform designed to evaluate the trustworthiness of large language models (LLMs) and multimodal models. Supporting both white-box and black-box LLMs, OmniTrust offers an out-of-the-box toolkit that is intuitive and easy to use—allowing developers and researchers to run complex security evaluations with a single line of code.
    
At its core, OmniTrust comprises six powerful modules that cover key areas of model integrity and safety:

                Safety Module: Protects against a wide array of attack techniques, including AutoDAN, CipherChat, and more, enabling both attack and defense evaluation on LLMs and multimodal models.
            
                Privacy Module: Focuses on fine-tuned models to detect data extraction, membership inference, and prompt word stealing attacks, ensuring compliance with privacy standards.
            
                Detectability Module: Provides watermarking addition, detection, and evaluation, safeguarding against tampering and ensuring content integrity.
            
                Truthfulness Module: Evaluates knowledge consistency and reasoning accuracy across tasks such as Q&A, code generation, and cross-modal content generation, mitigating the risk of misinformation.
            
                Fidelity Module: Protects LLMs from prompt injection attacks, ensuring consistent and fair scoring mechanisms for judge-based tasks.
            
                Fairness Module: Analyzes and mitigates bias in LLMs, providing insights into potential issues such as egocentric or attentional bias, and offering solutions for effective calibration.
            
        OmniTrust comes with a wide range of datasets tailored for each module, along with complete metrics to evaluate the overall trustworthiness of a model. The platform’s simplicity is key: a single line of code allows users to run any module, choose models, load datasets, and generate detailed, actionable reports—making it a powerful and user-friendly solution for securing and evaluating generative foundation models.