Meta’s LlamaV2 7B LLM Suffers from Susceptibility to DoS and Data Leakage
Large language models are subject to anomalous and malicious inputs, as well as unexpected behavior. They lie and make mistakes. The question is not if, but when and how. DeepKeep recently conducted an extensive evaluation of Meta’s LlamaV2 7B LLM, finding:
1. The LlamaV2 7B model is highly susceptible to both direct and indirect Prompt Injection (PI) attacks, with a majority of test attacks succeeding when exposing the model to context containing injected prompts.
2. The model is vulnerable to Adversarial Jailbreak attacks, which provoke responses that violate ethical guidelines. Tests reveal a significant reduction in the model's refusal rate under such attack scenarios.
3. LlamaV2 7B is highly susceptible to Denial-of-Service (DoS) attacks, with prompts containing transformations like replacement of words, characters and switching order leading to excessive token generation over a third of the time.
4. The model demonstrated a high propensity for data leakage across diverse datasets, including finance, health, and generic PII.
5. The model has a significant tendency to hallucinate, challenging its reliability.
6. The model often opts out of answering questions related to sensitive topics like gender and age, suggesting it was trained to avoid potentially sensitive conversations rather than engage with them in an unbiased manner.
DeepKeep’s evaluation of data leakage and PII management demonstrates the model's struggle to balance user privacy with the utility of information provided, showing tendencies for data leakage.
On the other hand, Meta’s LlamaV2 7B LLM shows a remarkable ability to identify and decline harmful content. However, our investigation into hallucinations indicate a significant tendency to fabricate responses, challenging its reliability.
Overall, the LlamaV2 7B model showcases its strengths in task performance and ethical commitment, with areas for improvement in handling complex transformations, addressing bias, and enhancing security against sophisticated threats. This analysis highlights the need for ongoing refinement to optimize LlamaV2 7B model’s effectiveness, ethical integrity, and security posture in the face of evolving challenges.
DeepKeep is a model-agnostic, multi-layer platform, safeguarding AI with AI-native security and trustworthiness from the R&D phase of machine learning models through to deployment, covering risk assessment, prevention, detection and protection.
Go to https://www.deepkeep.ai/llamav2-7b-analysis for more details.