Another player heats up generative AI race as China introduces interim laws

 Another player heats up generative AI race as China introduces interim laws

JD.com launches its large language model ChatRhino, which it says hits 100 billion parameters, amid new regulations in China to manage generative AI services.

Andriy Onufriyenko/Getty Images

JD.com is heating up the artificial intelligence (AI) race in China with the release of its large language model, even as regulators move to introduce regulations to manage generative AI services. 



China's second-largest online shopping platform, JD.com, said its ChatRhino(or yanxi in Chinese) has been customized to support several verticals such as logistics, retail, healthcare, and finance. The large language model comprises 70% general data and 30% "native intelligent" supply chain data, said the e-commerce player, which has a logistics arm as well as a healthcare business unit.

Also: Is Temu legit? What to know about this shopping app before you place your order

ChatRhino boasts a base of 100 billion parameters, up from the 10 billion-parameter benchmark clocked by its previous model Vega early last year. Vega had led the General Language Understanding Evaluation (GLUE) list, outpacing models from Microsoft and Facebook, said JD.com in a statement Thursday. 

newsletters

ZDNET Tech Today

ZDNET's Tech Today newsletter is a daily briefing of the newest, most talked about stories, five days a week.

subscribe


see all

GLUE measures and ranks natural language processing based on nine tasks, spanning a range of "dataset sizes, text genres, and degrees of difficulty." OpenAI's GPT-4 reportedly, though unconfirmed by the company, has more than 1 trillion parameters based on eight models. Its previous GPT-3 model has more than 175 billion parameters, while GPT-2 runs on 1.5 billion. 

ChatRhino offers more than 100 training and inference optimization tools that JD.com says support domain-specific application development, letting clients more quickly build their own specialized models. The vendor claims a generative AI model for the healthcare industry, for instance, can be built in "minutes" with two algorithm engineers, compared to the traditional method that typically requires a week and at least 10 scientists. 

JD Health's own large language model, Jingyi Qianxun, is built on ChatRhino and has been trained on medical scenarios to automatically deploy services, including telemedicine, according to JD.com.  

Also: Google tests its new AI medical chatbot at Mayo Clinic

E-commerce merchants also can tap ChatRhino to create a range of visuals, marketing posters, and product images, from one product image. The AI model can cut production cost for each visual asset by 90%, reducing the time needed to complete the task from a week to half a day, JD.com said. 

Interim laws to guide generative AI rollouts

The launch of ChatRhino comes the same week China introduced interim regulations to manage generative AI services in the country

To kick in from Aug. 15, the new laws are necessary to ensure the healthy development of the technology and safeguard both national security and public interests, the Chinese government said. 

Also: Generative AI is coming for your job. Here are 4 reasons to be excited

In a joint statement issued by various agencies including Cyberspace Administration of China (CAC) and Ministry of Science and Technology, the government noted that while generative AI had created new economic and social development opportunities, it also brought along challenges such as fake news and data privacy and safety risks.

The interim legislation outlines various measures that aim to facilitate the sound development of the technology, while protecting national and public interests and legal rights of citizens and businesses, the statement noted. 

Generative AI developers, for instance, will have to ensure their pre-training and model optimization processes are carried out in compliance with the law. These include using data from legitimate sources that adhere to intellectual property rights. Should personal data be used, the individual's consent must be obtained or it must be done in accordance with existing regulations. 

Also: Leadership alert: The dust will never settle and generative AI can help

Measures also have to be taken to improve the quality of training data, including its accuracy, objectivity, and diversity. 

Under the interim laws, generative AI service providers assume legal responsibility for the information generated and its security. They will need to sign service-level agreements with users of their service, thereby clarifying each party's rights and obligations.

When illegal content is uncovered, the service provider must take various measures such as preventing its transmission and rectifying its use in model training. The relevant authority also be notified. 

In addition, service providers would have to take necessary measures should a user engage in illegal activities using the generative AI service. These include restriction functions, suspending or terminating the service, maintaining relevant records, and reporting to the relevant authority. 

Service providers that breach the new laws will face penalties laid out in China's various existing relevant legislations, including the Network Security Law, Data Security Law, and Personal Information Protection Law. In instances where the are no provisions for violations, a warning will be issued alongside order corrections that must be fulfilled within a period of time. Failure to comply with such orders may result in a suspension of services. 

Also: 6 harmful ways ChatGPT can be used by bad actors, according to a new study

China in April released a draft preview of the legislation, saying the development of generative AI technologies such as ChatGPT could lead to abuse if left unregulated. A separate legislation came into effect in January that laid out ground rules to prevent "deep synthesis" technology, including deepfakes and virtual reality, from being abused. Anyone using these services must label the images accordingly and refrain from tapping the technology for activities that breach local regulations. 

In May, the Chinese government unveiled plans to build AI industrial hubs and tech platforms across the country to support research and development work. To date, development plans have been launched for 18 national AI pilot areas and 32 innovation platforms, including in Beijing and Tianjin.

Apart from JD.com, local players such as Tencent and Alibaba also have announced efforts to offer or integrate generative AI into their products. Alibaba Cloud in April unveiled its large language AI platform, called Tongyi Qianwen, which is currently available to customers in China for beta testing and as an API to developers. The Chinese cloud vendor also introduced a partnership program to drive the development of AI applications for verticals, including finance and petrochemicals. 

artificial intelligence

7 advanced ChatGPT prompt-writing tips you need to know


The 10 best ChatGPT plugins of 2023 (and how to make the most of them)


I've tested a lot of AI tools for work. These are my 5 favorite so far


Human or bot? This Turing test game puts your AI-spotting skills to the test


Editorial standards

show comments


LARGE PLAY-PAUSE TOGGLE

PLAY PAUSE

FULLSCREEN

Home  Innovation  Artificial Intelligence

These authors are suing OpenAI and Meta for copyright infringement now

This could catalyze stricter regulation on using copyrighted work to train AI models.

Written by Maria Diaz, Staff Writer on July 10, 2023

Sarah Silverman speaks on May 05, 2022 in New York City.

Cindy Ord/Getty Images for Variety

Sarah Silverman joined forces with fellow authors Richard Kadfrey and Christopher Golden to sue Meta and OpenAI in dual claims of copyright infringement. 

The suits are separate, each against one of the companies, and the authors claim they never consented for their copyrighted books to be used as training material for the large language models used (LLM) behind OpenAI's ChatGPT and Meta's LLaMa. 

Also: Generative AI is coming for your job. Here are 4 reasons to get excited

An LLM is a type of artificial intelligence algorithm trained using massive amounts of information from books and texts from the internet to learn language patterns, grammar, and context until it can generate human-like text and have chat interactions with users. 

newsletters

ZDNET Tech Today

ZDNET's Tech Today newsletter is a daily briefing of the newest, most talked about stories, five days a week.

subscribe


see all

According to the lawsuits, the models "remix the copyrighted works of thousands of book authors -- and many others -- without consent, compensation, or credit." 

Copyright infringement has been one of the many concerns of AI skeptics since ChatGPT became widely available in November, triggering the generative AI boom and questions about how AI will affect the creativity and copyright process.

Also: Who owns the code? If ChatGPT's AI helps write your app, does it still belong to you?

The lawsuits claim the LLMs were trained on illegally-acquired materials, such as those found in "shadow library" websites. According to the OpenAI suit:

"The OpenAI Books2 dataset can be estimated to contain about 294,000 titles. The only 'internet-based books corpora' that have ever offered that much material are notorious 'shadow library' websites like Library Genesis (aka LibGen), Z-Library (aka B-ok), Sci-Hub, and Bibliotik. The books aggregated by these websites have also been available in bulk via torrent systems."


The Meta suit makes similar claims, as it links to the sources where the books' training data was gathered. It divides them in two: The first as being from Project Gutenberg, which is an online archive of books that are out of copyright, and the second is from the "Books3 section of ThePile", which is a dataset available on the popular AI project hosting site, Hugging Face, and appears to represent all of Bibliotik, mentioned above.

Also: Want to build your own AI chatbot? Say hello to open-source HuggingChat

The plaintiffs are represented by lawyers Joseph Savery and Matthew Butterick, who also represent authors Mona Awad and Paul Tremblay in a lawsuit filed in June against OpenAI over copyright infringement.

artificial intelligence

7 advanced ChatGPT prompt-writing tips you need to know


The 10 best ChatGPT plugins of 2023 (and how to make the most of them)


I've tested a lot of AI tools for work. These are my 5 favorite so far


Human or bot? This Turing test game puts your AI-spotting skills to the test



Enregistrer un commentaire

Plus récente Plus ancienne