Skip to menu Skip to content

Korean e-government homepage mark This site is the official e-Government website of the Republic of Korea.

zoom
100%

Notice / Press Release

Notice Detail
Title PIPC Offers Guideline on Processing Publicly Available Data for AI Development and Services
Department Date 2024.07.18
Attachment press release PIPC Offers Guideline on Processing Publicly Available Data for AI Development and Services_revised.pdf
Page URL https://www.pipc.go.kr/eng/user/ltn/new/noticeDetail.do?bbsId=BBSMSTR_000000000001&nttId=2591
Contents

Press Release

PIPC Offers Guideline on Processing Publicly Available Data for AI Development and Services 

- PIPC unveils a “Guideline on Processing Publicly Available Data for AI Development and Services” 

- The guidance material will iron out legal uncertainties for businesses and enhance privacy protection for the people

 

(This is an unofficial translation of a press release, originally prepared in Korean.) 

 

A guideline on processing publicly available data from the Internet for the development of generative Artificial Intelligence (AI) models in a safe manner has been published by the country’s data protection authority. 

 

The Personal Information Protection Commission (“PIPC”) has unveiled a “Guideline on Processing Publicly Available Data for AI Development and Services” to enable the lawful and safe processing of publicly available data, a key ingredient for AI development and training, in a compliant manner. 

 

Publicly available data is accessible by anyone via the Internet, serving as a key ingredient of training data for the development of generative AI models such as ChatGPT. Generative AI is trained on multiple datasets by extracting (web scraping) training data from several sources including Common Crawl, Wikipedia, and other websites.

 

Due to the nature of publicly available data, the data may include addresses, unique identifiable information (UII), credit card numbers, and other personal data, increasing the likelihood of possible privacy risks to the people. However, there is an insufficient and unclear legal basis for processing publicly available data under the PIPA.

 

In the context of mass processing of publicly available data for AI training, obtaining consent from each data subject and reaching a relevant contract are impracticable under the current legal framework. Moreover, AI training has brought significant shifts in data processing mechanisms, bringing challenges to interpreting and applying the existing safeguards stipulated in the data privacy law to the new mechanisms.

 

Against this backdrop, the PIPC has prepared a guideline that clarifies a legal basis for collecting and utilizing publicly available data and also suggests applicable guardrails for AI developers and service providers. A released guideline will serve as a compass for businesses to minimize the possibilities of privacy-related issues and iron out legal uncertainties, and drive innovative growth as a goal in the era of AI. 

 

After announcing its “Policy Direction for Safe Usage of Personal Data in the Age of AI” in August 2023, the PIPC had discussions with the “Public-Private Policy Advisory Council for AI Privacy”, comprised of 30 AI experts with three subcommittees, on publishing a guideline as well as solicited opinions with various stakeholders from academia, industry, and civil society.

 

The European Union, the United States and other major countries are striving to strike a balance between innovation brought by AI and safety. Data protection authorities from the major countries are in the process of setting privacy-related norms and standards for the processing of publicly available data in the field of AI and other fields. Given the current global trends, the PIPC focuses on establishing internationally interoperable standards.

 

1. Applying the Concept of Legitimate Interests

 

To begin with, the guideline clarifies that publicly available data can be utilized for AI training and the development of AI services by legitimate interests set forth in Article 15 of the PIPA, meaning the legitimate interests of the personal data controllers clearly override the rights of the data subjects.

 

In order to apply the concept of legitimate interests, personal data processors, including AI developers and service providers, are required to meet the three requirements: legitimacy of the purpose; necessity of data processing; and assessment of associated interests among personal data processors and data subjects. The PIPC also suggested the content of the three requirements and applicable scenarios.

 

Legitimacy of the purpose:

Ensuring data processors have the legitimate interests to process personal data 

- Clarifying the legitimate interests by specifying the intended purposes of the development of AI models 

  (e.g. LLMs and other AI models to support medical diagnosis, carry out credit rating, generate, classify, and translate texts)  

 

Necessity of data processing: 

Ensuring collecting and utilizing publicly available data to be necessary, adequate, and appropriate

- Excluding irrelevant data such as an individual’s income and property for developing AI models to support medical diagnosis 

 

Assessment of associated risks among personal data processors and data subjects:

Ensuring that the legitimate interests of personal data processors clearly override the rights of data subjects 

- Measures to ensure safety in order to prevent infringement on the data subjects’ rights 

- Measures to ensure that personal data processors’ interests override those of data subjects by devising and implementing plans to uphold data subjects’ rights 

 

In this regard, establishing a standard to interpret and apply the concept of legitimate interests can promote international interoperability with global norms such as the EU’s General Data Protection Regulation (GDPR), and discussions on AI safety norms. 

 

2. Suggesting Applicable Guardrails and Ways to Safeguard Data Subjects’ Rights 

 

In the guideline released by the PIPC, technical and organizational safeguards for AI business operators to rely on legitimate interests to process publicly available data and how to respect the data subjects’ rights at the same time. 

 

Technical safeguards: 

● Examining the sources of the training datasets collected 

● Taking measures to prevent personal data breaches (erasure, de-identification) 

● Safe storage and management of personal data 

● Adding additional safeguard through fine-tuning 

● Applying prompt and output filtering functions 

● Removing the influence of targeted training data points from the training data (Machine Unlearning) 

 

Administrative and Organizational Safeguards: 

● Establishing criteria for collecting and using training datasets and incorporating them to privacy policies 

● Considering to carry out Privacy Impact Assessment (PIA) 

● Operating AI Privacy Red Team 

● Implementing safeguards tailored to an AI model, development and deployment of AI services (e.g. open source, API) 

 

Respecting the Data Subjects’ Rights: 

● Incorporating the status of collecting publicly available data and the main sources into privacy policy and other documents 

● Upholding the data subjects’ rights, including devising measures to exercise their rights to erasure and suspension for data leakage in the process of AI training and deploying AIenabled services  

 

The PIPC allows AI business operators to adopt and implement detailed safeguards in a flexible manner given the rapid technological advancements of AI. In this vein, AI businesses do not need to implement every safeguard stipulated in the guideline, but rather pick and choose the most optimal options by considering various conditions tailored to each business: intended functions, side effects, such as performance degradation and bias, and the maturity of AI technologies. 

 

The published guidance material elaborates on technical and administrative safeguards implemented by major Large Language Models (LLMs) service providers as a result of preemptive inspections of AI services conducted in March to help businesses powered by LLMs find the best combination to follow suit.

 

The PIPC already shared the results of pre-emptive inspections of some AI services and resolved to issue recommendations to them in terms of processing personal data in March 2024. The recommendations include that PIPC periodically detects and remove URLs that expose identifiable information, such as RRNs (Resident Registration Numbers), and provides the results to AI developers and service providers; thereby, they are advised to remove the URLs from the training datasets.

 

3. Fostering the Roles of AI Businesses for the Development and Usage of Trustworthy AI 

 

Last but not least, the guideline stresses the importance of the roles of AI businesses and Chief Privacy Officers (CPOs) to process datasets for the development of AI models. The guideline recommends that AI-powered businesses voluntarily form and operate a dedicated team for AI privacy by fostering the roles of CPOs to assess the requirements set out in the guideline and devise and store the basis. The published guideline calls for business operators to periodically monitor risk factors including significant changes in technology and concerns over data breaches. AI-based businesses are also advised to rapidly devise and implement relief and remedial measures.

 

To align with legislation and amendments to the PIPA, technological advancements in AI, and regulatory overhaul by overseas data protection authorities, the guideline will be kept updated.

 

The supervisory authority is set to materialize the lawful basis and criteria for processing both users’ personal data, another key ingredient for the datasets to be trained for AI, by soliciting opinions from academia, industry, and civil society. 

 

Korea’s data protection authority also plans to communicate with AI-powered businesses through support schemes to promote innovation, including the Prior Adequacy Review Scheme, Regulatory Sandbox, and Personal Information Safety Zone, to keep an eye on technological advancements and market conditions. Upon the best practices and experiences, the PIPC is set to overhaul the PIPA in accordance with the era of AI.

 

Professor Byoung Pil Kim of KAIST, who has engaged in discussions on the guideline and serves as the head of the Advisory Council’s Subcommittee on Criteria for Data Processing, said that “it is part of our endeavors to meet halfway between protecting personal data and encouraging AI-driven innovation. This will be a great guidance material for the development and usage of trustworthy AI.” He stressed that “given the breakneck advancements in AI technologies, please keep in mind that the guideline will be further updated to reflect the shifts in the global privacy landscape.” 

 

Head of LG AI Research Kyunghoon Bae, also as a co-chairperson of the Advisory Council, stated that “the guideline is part of our great strides and first step toward promoting technological advancements in AI and personal data protection.” He also mentioned that “the guideline provides a lawful basis to safely process personal data from publicly available data to mitigate legal uncertainties in developing AI technologies. The guideline will serve as a bedrock for businesses to enjoy the benefits brought by AI technologies in a safe data processing environment trusted by the people.” 

 

Chairperson Haksoo Ko stated that “Clarification is not sufficient enough as to how to ensure legality and safety in using publicly available data for AI model training, even though AI technology is advancing at an exponential rate.” He said that “we hope this guideline helps businesses set examples for leveraging AI and data in a reliable manner, and the best practices established over time are kept added to this guidance material."

 

* A PDF file, formatted for better readability, is attached

  

Previous
PIPC Appoints Privacy Status Assessment Group and Privacy Policy Assessment Committee to Enhance Transparency and Responsibility
Next
PIPC Sanctions AliExpress for Violating PIPA, Imposing Penalty Surcharge of 1.978 billion KRW