Generative Artificial Intelligence Models Leak Private Data

Adoption of ChatGPT has over the past few months since release of of 4th generation version has greatly increased and, right  now, more than 100 million users are signed up to the program. 

This has been made possible by the platform's aggregation of over  300 billion items of text and other data, scraped from online sources like   articles, posts, websites, journals  and books.

Although OpenAI has developed and trained the ChatGPT model to operate within parameters intended to deliver useful ouput, analysts of the model say that this data is gathered without discrimation between fact and fiction, copyright status, or data privacy. 

Now, researchers from Northwestern University have published a study in which they explain how they could use keywords to trick ChatGPT into and releasing training data that was not meant to be disclosed.

Although OpenAI has taken steps to protect privacy, everyday chats and postings leave a massive pool of data and much of it is personal which is not intended for widespread distribution. Generative AI platforms, such as ChatGPT, are built by data scientists through a training process where the program in its initial, unformed state, is subjected to billions of bytes of text, some of it from public Internet sources and some from published books. 

The fundamental function of training is to make the program reproduce anything that is given acces to, using essentially a compression technique. A program, once trained, could reproduce the training data, based upon only a very small amount data being submitted as an enquiry, prompting the relevant response. 

The researchers said that they were able to extract over 10,000 unique verbatim memorised training examples using only $200 worth of queries to ChatGPT, adding- “Our extrapolation to larger budgets suggests that dedicated adversaries could extract far more data.” Indeed, they found that they could obtain names, phone numbers, and addresses of individuals and companies by feeding ChatGPT absurd commands that forced a malfunction.

For example, the researchers requested that ChatGPT repeat the word “poem” ad infinitum, which forced the model to reach beyond its training procedures and “fall back on its original language modelling objective” and tap into restricted details in its training data. They also reached a similar result by requesting infinite repetition of the word “company,” and managed to retrieve the email address and phone number of an American law firm.

In response to potential unauthorised data disclosures, some companies have placed restrictions on employee usage of large language models earlier this year.  Rising concerns about data breaches caused OpenAI to add a feature that turns off chat history, adding a layer of protection to sensitive data. The problem is that such data is still retained for 30 days before being permanently deleted.

In conclusion, the researchers termed their findings “worrying” and said their report should serve as “a cautionary tale for those training future models,” warning that users “should not train and deploy LLMs for any privacy-sensitive applications without extreme safeguards.”

Northwestern Univ:   SearchEngine Journal:   I-HLS:    ZDNet:    TechXplore:    New Scientist:    Wired:    Wired:   

Science Direct:   Business Insider:     Image: DeepMind

You Might Also Read: 

Guidelines For AI Systems Development:

DIRECTORY OF SUPPLIERS - AI Security & Governance:

___________________________________________________________________________________________

If you like this website and use the comprehensive 6,500-plus service supplier Directory, you can get unrestricted access, including the exclusive in-depth Directors Report series, by signing up for a Premium Subscription.

  • Individual £5 per month or £50 per year. Sign Up
  • Multi-User, Corporate & Library Accounts Available on Request

Cyber Security Intelligence: Captured Organised & Accessible


« EU Agrees Regulations For Artificial Intelligence
Overcoming The Cybersecurity Challenge »

CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

Practice Labs

Practice Labs

Practice Labs is an IT competency hub, where live-lab environments give access to real equipment for hands-on practice of essential cybersecurity skills.

CSI Consulting Services

CSI Consulting Services

Get Advice From The Experts: * Training * Penetration Testing * Data Governance * GDPR Compliance. Connecting you to the best in the business.

North Infosec Testing (North IT)

North Infosec Testing (North IT)

North IT (North Infosec Testing) are an award-winning provider of web, software, and application penetration testing.

BackupVault

BackupVault

BackupVault is a leading provider of automatic cloud backup and critical data protection against ransomware, insider attacks and hackers for businesses and organisations worldwide.

XYPRO Technology

XYPRO Technology

XYPRO is the market leader in HPE Non-Stop Security, Risk Management and Compliance.

44CON

44CON

44CON is an Information Security Conference & Training event taking place in London. Designed to provide something for the business and technical Information Security professional.

Atlantic Council

Atlantic Council

The Atlantic Council's Cyber Statecraft Initiative focuses on international cooperation, competition, and conflict in cyberspace.

Cyber Security Research Centre - University of Cardiff

Cyber Security Research Centre - University of Cardiff

Cardiff University's Centre for Cyber Security Research is a leading UK academic research unit for cyber security analytics.

Assured Information Security (AIS)

Assured Information Security (AIS)

AIS is committed to providing our customers with critical information security products, services, and training. We support diverse needs throughout business and industry.

VU Security

VU Security

VU is a specialist in Cybersecurity software development with a focus on the prevention of fraud and identity theft.

Cynerio

Cynerio

Cynerio develops cybersecurity protections for medical devices, comparing network behavior with a database of medical workflows.

bluedog Security Monitoring

bluedog Security Monitoring

Sentinel from bluedog provides powerful and affordable internal network monitoring.

Adzuna

Adzuna

Adzuna is a search engine for job ads used by over 10 million visitors per month that aims to list every job everywhere, including thousands of vacancies in Cybersecurity.

SOSA

SOSA

SOSA facilitates new growth opportunities by connecting the dots between industry verticals and innovation ecosystems around the world.

ACET Solutions

ACET Solutions

ACET Solutions delivers a wide range of Automation, Cyber Security and Enterprise IT/OT Integration Solutions to industrial clients.

Credible Digital Security Pvt. Ltd. (CDSPL)

Credible Digital Security Pvt. Ltd. (CDSPL)

CDSPL is an innovative Cyber Security Services Company in India. We are committed to offering cyber security solutions for important sectors such as energy and utilities, healthcare, and more.

EtherAuthority

EtherAuthority

EtherAuthority's engineering team has been helping blockchain businesses to secure their smart contract based assets since 2018.

Quantum Security Services

Quantum Security Services

Quantum Security Services is a specialist information security firm providing a range of risk, compliance and technical security services.

Cyviation

Cyviation

Cyviation's mission is to mitigate ever-growing and menacing Cyber Security threats, focusing on aircraft, airlines and airports.

OxCyber

OxCyber

OxCyber's mission is to ignite and encourage cybersecurity and technology growth in the Thames Valley through meetings, webinars, in person events, workshops and mentorship programs.

Onum

Onum

Onum helps security and IT leaders focus on the data that's most important. Gain control of your data by cutting through the noise for deep insights in real time.