Generative Artificial Intelligence Models Leak Private Data

Adoption of ChatGPT has over the past few months since release of of 4th generation version has greatly increased and, right  now, more than 100 million users are signed up to the program. 

This has been made possible by the platform's aggregation of over  300 billion items of text and other data, scraped from online sources like   articles, posts, websites, journals  and books.

Although OpenAI has developed and trained the ChatGPT model to operate within parameters intended to deliver useful ouput, analysts of the model say that this data is gathered without discrimation between fact and fiction, copyright status, or data privacy. 

Now, researchers from Northwestern University have published a study in which they explain how they could use keywords to trick ChatGPT into and releasing training data that was not meant to be disclosed.

Although OpenAI has taken steps to protect privacy, everyday chats and postings leave a massive pool of data and much of it is personal which is not intended for widespread distribution. Generative AI platforms, such as ChatGPT, are built by data scientists through a training process where the program in its initial, unformed state, is subjected to billions of bytes of text, some of it from public Internet sources and some from published books. 

The fundamental function of training is to make the program reproduce anything that is given acces to, using essentially a compression technique. A program, once trained, could reproduce the training data, based upon only a very small amount data being submitted as an enquiry, prompting the relevant response. 

The researchers said that they were able to extract over 10,000 unique verbatim memorised training examples using only $200 worth of queries to ChatGPT, adding- “Our extrapolation to larger budgets suggests that dedicated adversaries could extract far more data.” Indeed, they found that they could obtain names, phone numbers, and addresses of individuals and companies by feeding ChatGPT absurd commands that forced a malfunction.

For example, the researchers requested that ChatGPT repeat the word “poem” ad infinitum, which forced the model to reach beyond its training procedures and “fall back on its original language modelling objective” and tap into restricted details in its training data. They also reached a similar result by requesting infinite repetition of the word “company,” and managed to retrieve the email address and phone number of an American law firm.

In response to potential unauthorised data disclosures, some companies have placed restrictions on employee usage of large language models earlier this year.  Rising concerns about data breaches caused OpenAI to add a feature that turns off chat history, adding a layer of protection to sensitive data. The problem is that such data is still retained for 30 days before being permanently deleted.

In conclusion, the researchers termed their findings “worrying” and said their report should serve as “a cautionary tale for those training future models,” warning that users “should not train and deploy LLMs for any privacy-sensitive applications without extreme safeguards.”

Northwestern Univ:   SearchEngine Journal:   I-HLS:    ZDNet:    TechXplore:    New Scientist:    Wired:    Wired:   

Science Direct:   Business Insider:     Image: DeepMind

You Might Also Read: 

Guidelines For AI Systems Development:

DIRECTORY OF SUPPLIERS - AI Security & Governance:

___________________________________________________________________________________________

If you like this website and use the comprehensive 6,500-plus service supplier Directory, you can get unrestricted access, including the exclusive in-depth Directors Report series, by signing up for a Premium Subscription.

  • Individual £5 per month or £50 per year. Sign Up
  • Multi-User, Corporate & Library Accounts Available on Request

Cyber Security Intelligence: Captured Organised & Accessible


« EU Agrees Regulations For Artificial Intelligence
Overcoming The Cybersecurity Challenge »

CyberSecurity Jobsite
Check Point

Directory of Suppliers

TÜV SÜD Academy UK

TÜV SÜD Academy UK

TÜV SÜD offers expert-led cybersecurity training to help organisations safeguard their operations and data.

Authentic8

Authentic8

Authentic8 transforms how organizations secure and control the use of the web with Silo, its patented cloud browser.

Practice Labs

Practice Labs

Practice Labs is an IT competency hub, where live-lab environments give access to real equipment for hands-on practice of essential cybersecurity skills.

Clayden Law

Clayden Law

Clayden Law advise global businesses that buy and sell technology products and services. We are experts in information technology, data privacy and cybersecurity law.

Directory of Cyber Security Suppliers

Directory of Cyber Security Suppliers

Our Supplier Directory lists 8,000+ specialist cyber security service providers in 128 countries worldwide. IS YOUR ORGANISATION LISTED?

Netskope

Netskope

Netskope, a global cybersecurity leader, is redefining cloud, data, and network security to help organizations apply Zero Trust principles to protect data.

Hivint

Hivint

Hivint is a new kind of Information Security professional services company enabling collaboration between our clients to reduce unnecessary security spend.

Network Box

Network Box

Network Box is one of the world's leading Managed Security Service Providers.

National Cybersecurity Society (NCSS) - USA

National Cybersecurity Society (NCSS) - USA

The National Cybersecurity Society is a non-profit organization focused on providing cybersecurity education, awareness and advocacy to small businesses.

iSolutions

iSolutions

iSolutions is an official reseller and engineering company of leading products and solutions for cybersecurity and information protection, optimization, visualization and control of applications

SurePassID

SurePassID

SurePassID is a provider of highly secure, highly extensible multi-factor authentication (MFA) solutions.

Ethyca

Ethyca

Ethyca builds automated data privacy infrastructure and tools for developers and privacy teams to easily build products that comply with GDPR, CCPA Privacy Regulations.

Charterhouse Voice & Data

Charterhouse Voice & Data

Charterhouse is your trusted technology partner - designing, provisioning and supporting the technology that underpins your operations including network security and data compliance.

Dectar

Dectar

Dectar (formerly 4Securitas) is a cybersecurity company that provides solutions that predict, detect, defend and react against cybersecurity threats.

Cyber Coaching

Cyber Coaching

Cyber Coaching is a community for enhancing technical cyber skills, through unofficial certification training, cyber mentorship, and personalised occupational transition programs.

Cyber Security Partners (CSP)

Cyber Security Partners (CSP)

Cyber Security Partners specialise in the provision of Cyber Security Consultancy, Data Protection and Certification and Compliance services.

Getronics

Getronics

Getronics guides customers through their own transformation journeys, leveraging an integrated and secure-by-design IT portfolio.

NorthStar

NorthStar

NorthStar provide the visibility needed to track and reduce risk through risk-based vulnerability management and vulnerability exploit prediction.

Protos Labs

Protos Labs

Protos Labs enables insurers & enterprises to make better cyber risk decisions through holistic, real-time risk management tools.

NSW IT Support

NSW IT Support

NSW IT Support: Your exclusive hub for comprehensive Business IT services in Sydney. Our skilled team ensures seamless technology solutions nationwide, consistently delivering top-tier IT support.

Faddom

Faddom

Faddom is an agentless tool that visualizes your on-premises and cloud infrastructure, as well as their inter-dependencies.