DeepSeek's B-Side: Hallucinations, Privacy, and "Ghost Features"
Updated on: 31-0-0 0:0:0

In the fierce competition in China's AI field, DeepSeek quickly occupied the market and became a new star with its shocking price-performance ratio and excellent performance.

Compared to huge large models such as OpenAI's GPT-4, DeepSeek not only has a staggering low training cost, but also has no less performance, and even surpasses these industry giants in several key tasks.

The rise of DeepSeek is not accidental, according to Tianyancha information, DeepSeek was founded in 2023 years, less than two years, with optimized algorithms and efficient utilization of hardware resources, whether it is processing speed, resource occupation, or computing power output, it has shown strong potential, and it has quickly become a popular fried chicken across the "civil" and "commercial" fields.

However, despite the significant technological breakthroughs that DeepSeek has made, it still faces some thorny challenges, especially the "illusion" problem and privacy protection issues, which may pose a significant obstacle to its future development.

1、幻覺,DeepSeek"打敗"DeepSeek

Imagine that you may have experienced a dream in which you knew you were sleepwalking, but still believed that you were in reality. This distorted way of perceiving is called the brain's "hallucination".

The same goes for DeepSeek, its "hallucinations" also have errors in generating content, and while it looks real, once you dig deeper, you can see that the content doesn't match the actual facts. For example, you may have encountered strange sounds in your sleep in your daily life, which are not real, but give you a strong "illusion" that you mistakenly believe to be true.

It's like the content generated by DeepSeek, which is very reasonable and logically consistent on the surface, but is actually skewed from the real world.

DeepSeek的“幻覺”,可以說是DeepSeek打敗了DeepSeek。

▲圖源:《DeepSeek 用戶協定》

Because, once there is an "illusion" situation, it can even lead to a disaster for tasks that require high-precision data and rigorous logic (such as medical treatment, legal analysis, etc.). DeepSeek's "hallucination" is like an inescapable brain illusion for human beings, and its team may not be able to overcome and solve this "thorny" but "congenital" problem. If this "illusion" is ignored, the cost to the user will be unpredictable and dangerous, especially in the task of performing precise judgment and careful decision-making.

Why does DeepSeek have a "hallucination" problem? It is roughly due to the following aspects:

One is that training data is "polluted."

DeepSeek's training data contains a large amount of multimodal data such as text collected from the Internet. The variety of sources of this data makes it sometimes difficult to guarantee its quality and accuracy, resulting in the inclusion of content from other models or unreliable data sources, which makes it possible for DeepSeek to learn from these incorrect data during the training process, thus exhibiting hallucinations when actually generated.

The second is the limitations of the model structure.

The architecture adopted by DeepSeek relies on the Next Token Prediction mechanism. However, this probability-based generation mechanism is unable to handle complex contexts in some cases, especially for tasks that require deep reasoning and contextual understanding, and is prone to logical inconsistencies or erroneous results.

Third, there is a lack of understanding of the environment and culture.

At present, although most large AI models, including DeepSeek, are outstanding in data processing and pattern recognition, they lack an in-depth understanding of the actual environment, social culture, and common sense.

This makes it easy to make mistakes when reasoning, especially when dealing with tasks that require high-level emotional understanding, cultural differences, or ethical judgments, and the "knowledge" of the model is only based on pattern recognition of data, rather than human common sense and judgment. The combination of these factors makes DeepSeek trigger "hallucination" questions in certain scenarios, unable to provide real, accurate answers or generate content.

Although most AI large language models will have a certain "hallucination", because the application of DeepSeek involves more professional fields such as law and medical treatment, and the fault tolerance rate is very low, although DeepSeek is in the limelight, its "hallucination" problem is more prominent than other large models, and it will increasingly bother many users.

2、Privacy, DeepSeek's technical challenge

Another problem for DeepSeek is how to strike a balance between privacy protection and technological innovation.

Especially in the fields of finance, healthcare, education, and autonomous driving, once data is leaked, the relevant private information will inevitably be stolen by hackers. And once this data is obtained by "key people" or widely disseminated on Internet social platforms, it will be a fatal blow to both individuals and enterprises.

As DeepSeek quickly enters highly sensitive fields with data privacy protection needs, such as finance, healthcare, education, and autonomous driving, its privacy and data security issues have also become the focus of attention from all walks of life.

▲圖源:《DeepSeek 隱私政策》

In addition to data collection and processing and cross-platform cooperation, there are several privacy and data security risks in DeepSeek:

First, the transparency of the "black box" is missing

As a complex deep learning model, DeepSeek's decision-making process is often "black box" in nature, which means we don't have a full understanding of how the model generates results or processes data. The inability to trace data processing paths and specific decision-making processes increases the risk of data being misused or compromised. Especially when it comes to user privacy and the handling of sensitive data, the lack of transparency makes it difficult to protect data privacy.

Second, the model relies too heavily on a large number of unvalidated external inputs

According to comprehensive information from Tianyancha and other media, the search found that DeepSeek, as a large language model, inevitably needs to rely on a large number of external inputs and training data, which come from various sources, some of which may not have been rigorously verified.

Without adequate filtering and cleaning, models can inadvertently disclose certain sensitive information when generating content. For example, in a generation task, the model may remember some user privacy or sensitive data from the training data, and this information will be output to other users through the model, resulting in privacy leakage.

Third, insufficient encryption and access control

Although DeepSeek has strengthened the encryption protection mechanism in many scenarios, in some applications and data interactions, especially in the process of API interface calls and data transmission, the encryption measures are not strong enough or the access control is not strict enough. This makes the security of the model and user data vulnerable in transit, and can be accessed, stolen, or tampered with by hackers or unauthorized users.

In addition, there is a lack of real-time monitoring and early warning mechanisms for data breaches

During the deployment and operation of DeepSeek, there is a lack of sufficient real-time monitoring and early warning mechanisms for data breaches, which is more likely to lead to the failure of the system to detect and take measures in time when it encounters attacks or anomalies. For example, if you encounter a hacker attack, you may steal a large amount of data without being aware of it through system vulnerabilities, and once the data is leaked, it is often difficult to repair and trace it in time, and it is even more difficult for users to "recover" the losses caused by the data privacy leakage.

Finally, there is the risk of exogenous leakage of private data

When "grafting" with other third-party services, although DeepSeek itself may have strong security measures, the security of external services and interfaces is not high, which will lead to the potential risk of data leakage. For example, a model may inadvertently leak data through interfaces or interactions with external services, especially in the absence of strict security scrutiny.

The combination of these factors makes DeepSeek face privacy and data security risks that are difficult to solve by the technology itself. Therefore, when it is applied in the fields of finance, law, education, autonomous driving, and even medical care, it should be paid great attention to prevent problems before they occur.

3、DeepSeek's "Ghost Features"

It is said that DeepSeek is like a "ghost" in human society, mainly because it brings efficiency improvements, but also inevitably brings negative effects. This "ghost" trait, sometimes like some mysterious force, can quickly increase productivity and processing power in specific areas, but it also lurks in other places, causing misdirection, loss of control, and even hidden dangers.

Here are a few key reasons:

First, there is a lack of true understanding and judgment

Although DeepSeek can process large amounts of data and generate content on many tasks, it does not have the true ability to understand and judge like humans. It doesn't understand what it generates, it just generates and outputs based on the patterns of the input. Hence the problem of its "hallucination".(e.g. wrong reasoning, content that does not correspond to facts)It can lead to misleading and affect the dependence on correct knowledge in work and study.

Second, it is impossible to completely avoid faulty reasoning and logical flaws

Just like some "charlatans" in human society, who often say that they are inaccurate and lack depth, although DeepSeek has gained an advantage in a large amount of data, it will also generate some reasoning and suggestions that seem reasonable on the surface but are actually wrong due to the limitations of algorithm design. Especially in complex tasks that require precise judgment, such mistakes can lead to poor decision-making and unpredictable negative consequences.

Third, information overload and dependence

DeepSeek greatly improves the speed of information acquisition and analysis through efficient information processing capabilities, but this speed may also make people dependent on it, or even over-dependent. Comprehensive information from the media such as Tianyancha shows that people may gradually give up independent thinking and critical analysis, resulting in a shallow taste of knowledge and a lack of depth. This dependence may lead to "intellectual laziness" in work and study, affecting the ability to innovate and think in the long run.

Fourth, the high energy and limitations of professional applications

DeepSeek has demonstrated powerful processing capabilities in professional fields such as finance, law, education, autonomous driving, and healthcare, and it can quickly analyze large amounts of data, provide decision support, and even improve processing efficiency and accuracy in some cases, however, it also has limitations, especially in the complexity and high regulatory requirements of these fields, and the reasoning ability of AI is still far inferior to that of human experts.

Fifth, there is uncertainty about sexual and moral dilemmas

According to comprehensive information from Tianyancha media, DeepSeek can efficiently process massive amounts of data, but its decision-making process often lacks transparency, and the resulting uncertainty may lead to moral and ethical dilemmas.

People may rely on AI to make decisions, but without a clear ethical framework and review mechanism behind these decisions, it can lead to outcomes that are not in line with societal values. For example, AI may lack consideration of the ethical consequences when choosing whether to terminate an employee or handle customer information. To sum up, DeepSeek is like the "ghost" in the "Three Religions and Nine Streams", although it can "buff" us in some areas, but in the absence of human judgment and moral consideration, its "ghost" characteristics can also have a serious negative impact on people.

Overall, while DeepSeek has brought breakthrough technological advances on many levels, it also needs to pay the same attention to risk control as other large language models. How to balance technological innovation and risk control, and how to improve the explainability and stability of the system will be the key to the long-term development of DeepSeek.

Author |Lin Feixue

Editor|Hu Zhanjia

Operations|Chen Jiahui

Header Diagram |DeepSeek official WeChat

Display|Zero-state LT (ID: LingTai_LT)

The spring god car is coming!
The spring god car is coming!
2025-03-28 11:58:01