Chinese government officials are testing large language models from artificial intelligence companies to ensure that their systems ‘embody fundamental socialist values‘, reports FT.
The Chinese Cyberspace Administration (CAC), a powerful internet regulator, has forced major tech companies and newly established AI firms, including ByteDance, Alibaba, Moonshot, and 01.AI, to participate in a mandatory government review of their AI models. These Chinese regulators are mass testing the responses of language models to a multitude of questions that are primarily related to ‘the political sensitivity of China and its President Xi Jinping. The work is carried out by officials in CAC’s local branches across the country and includes reviewing training data for the models as well as other security processes.
Two decades after the introduction of the ‘Great Chinese Firewall’ to block foreign websites and other information deemed harmful by the ruling Communist Party, China is establishing the world’s strictest regulatory regime for managing artificial intelligence and the content it generates.
CAC has a ‘special team that does this, they came to our office and sat in our conference room to conduct the review,’ said an employee at an AI company based in Hangzhou, who wished to remain anonymous.
China’s demanding approval process has forced AI companies in the country to quickly learn how to best censor large language models they develop, a task that numerous engineers and industry insiders have said is difficult and complicated due to the need to train LLMs on a large amount of content in English.
‘Our core model is very, very unrestrained in its responses, so security filtering is extremely important,’ said an employee of one of the leading AI start-up companies in Beijing.
Content Filtering
Filtering begins with removing problematic information from the training data and building a database of sensitive keywords. Chinese operational guidelines for AI companies published in February state that AI companies and start-ups need to collect thousands of sensitive keywords and questions that violate ‘fundamental socialist values’ such as ‘inciting subversion of state power’ or ‘endangering national unity’. Sensitive keywords need to be updated weekly, the party has ordered.
The result is visible to users of Chinese AI chatbots who refuse all inquiries about sensitive topics. Inquiries such as what happened on June 4, 1989 — the date of the Tiananmen Square massacre — or whether Xi looks like Winnie the Pooh, an internet meme, most Chinese AIs will not respond.
