Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
zbeeb 's Collections
Speculative Decoding
Arabic Safety
Shared Unsafe Directions
Reasoning Vectors
edgebot
Arabic Assets
Translation Assets

Shared Unsafe Directions

updated 3 days ago

Do Language Models Share Unsafe Directions in Activation Space?

Upvote
-

  • zbeeb/safe

    Updated Dec 15, 2025 • 41

  • zbeeb/unsafe

    Viewer • Updated Dec 15, 2025 • 200 • 17

  • zbeeb/Benign

    Updated Dec 15, 2025 • 8

  • zbeeb/pythia-Activations

    Updated Dec 16, 2025
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs