Skip to content

Navigation Menu

Appearance settings

TrustAIRLab

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

TrustAIRLab

Overview
Repositories
Projects
Packages
People

More

Overview
Repositories
Projects
Packages
People

README.md

TrustAIRLab (Trustworthy AI Research Lab) is a research lab dedicated to the trustworthy machine learning, with a focus on safety, privacy, and security. It aims to

offer high-quality libraries to reduce the difficulties in algorithm reproduction
benchmark existing attacks and defenses on machine learning models
build a solid foundation for Trustworthy AI research and development

Popular repositories Loading

JailbreakRadar JailbreakRadar Public

Python 74 5
VoiceJailbreakAttack VoiceJailbreakAttack Public

Code for Voice Jailbreak Attacks Against GPT-4o.

Python 31 1
JailbreakLLMs JailbreakLLMs Public

A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).

11
ZeroFake ZeroFake Public

Python 11 1
Conversation_Reconstruction_Attack Conversation_Reconstruction_Attack Public

This is the public code repository for the paper 'Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models'

Python 9 1
SecurityNet SecurityNet Public

JavaScript 8

Repositories

Loading

Type

Select type

All Public Sources Forks Archived Mirrors Templates

Language

Select language

All JavaScript Python TeX

Sort

Select order

Last updated Name Stars

Showing 10 of 24 repositories

JailbreakRadar Public

TrustAIRLab/JailbreakRadar’s past year of commit activity

Python 74 5 0 0 Updated Jun 2, 2025
AIGT_on_Social_Media Public
[ACL2025] Official repository for "Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media"

TrustAIRLab/AIGT_on_Social_Media’s past year of commit activity

Python 3 1 0 0 Updated May 29, 2025
Conversation_Reconstruction_Attack Public
This is the public code repository for the paper 'Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models'

TrustAIRLab/Conversation_Reconstruction_Attack’s past year of commit activity

Python 9 1 0 0 Updated May 21, 2025
T-GPS Public

TrustAIRLab/T-GPS’s past year of commit activity

Python 2 Apache-2.0 0 0 0 Updated May 12, 2025
GPTracker Public
[S&P'25] GPTracker: A Large-Scale Measurement of Misused GPTs

TrustAIRLab/GPTracker’s past year of commit activity

Python 6 GPL-3.0 0 0 0 Updated Apr 2, 2025
HateBench Public
[USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns

TrustAIRLab/HateBench’s past year of commit activity

7 Apache-2.0 2 0 0 Updated Mar 1, 2025
synthetic_artifact_auditing Public
[Usenix Security 2025] Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications

TrustAIRLab/synthetic_artifact_auditing’s past year of commit activity

Python 3 Apache-2.0 0 0 0 Updated Jan 29, 2025
proactive_unsafe_generation Public
[Usenix Security 2025] On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts

TrustAIRLab/proactive_unsafe_generation’s past year of commit activity

Python 2 Apache-2.0 0 1 0 Updated Jan 29, 2025
Hateful_Memes_in_VLM Public

TrustAIRLab/Hateful_Memes_in_VLM’s past year of commit activity

0 Apache-2.0 0 0 0 Updated Jan 28, 2025
ModSCAN Public
An official public repository of the paper "ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities" (https://arxiv.org/abs/2410.06967).

TrustAIRLab/ModSCAN’s past year of commit activity

Python 2 MIT 1 0 0 Updated Jan 8, 2025

View all repositories

People

Top languages

Loading…

Uh oh!

There was an error while loading. Please reload this page.

Most used topics

Loading…

Uh oh!

There was an error while loading. Please reload this page.

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.