DeepSeek is a variation of Large Language Models (LLMs) that uses Chain of Thought (CoT) reasoning, which enhances problem-solving through a step-by-step reasoning process rather than providing direct answers.
Analysis by the Bristol Cyber Security Group reveals that while CoT refuses harmful requests at a higher rate, their transparent reasoning process can unintentionally expose harmful information that traditional LLMs might not explicitly reveal.
This study, lead by Zhiyuan Xu, provides critical insights into the safety challenges of CoT reasoning models and emphasizes the urgent need for enhanced safeguards. As AI continues to evolve, ensuring responsible deployment and continuous refinement of security measures will be paramount.
Read the full University of Bristol news item
Paper: ‘The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models’ by Zhiyuan Xu, Dr Sana Belguith and Dr Joe Gardiner in arXiv.