A new community rule for language model disclosure
from taco_shale032@lemmy.ml to privacy@lemmy.ml on 14 Jun 09:58
https://lemmy.ml/post/48724623

I’m trying to look at this from a neutral point of view which is why I believe enforcing a disclosure, when (AI) models are used, would benefit the community.

I believe using models can harm privacy when not used correctly because they’re more likely to output misleading or outright incorrect information due to “hallucinations”. And from my experience, more often than not is this the case with the projects I see.

I’m curious what others think about this, if you disagree, please let me know why.

#privacy

threaded - newest

BlackJerseyGiant@lemmy.world on 14 Jun 12:00 next collapse

Language is, in addition to being a basis for communication, a set of tools for thought. Each word can enable the contemplation of whole concepts. Try to explain or think about time without using the word time, for example. The AI we have is a map of our langauge, of some of the tools we use for thought. This map is being sold as the territory. Ai is being sold as thought.

At a minimum, privacy requires understanding, feeling, and veracity. AI can provide none of these things, being absent of thought as it is, and as such has no place in this space.

FineCoatMummy@sh.itjust.works on 14 Jun 16:22 next collapse

One way that LLM harm privacy is through training on, well, everything the tech co can get its hands on. Which can include your posts, and anything you disclosed IN those posts. Not to mention anything you typed into most of the big LLMs on the web.

Once that info is trained into the model, you can’t just go delete it! If it was a file on a disk, in theory you can remove that. OK, sometimes that’s hard in practice, but in theory you can. When it’s baked into model weights, that’s different. You can’t un-bake it into the model!

People have found that commerical LLMs will give back personal info about themselves. Their phone numbers. Where they work. Sometimes even health info, if somehow the model got trained on that! The model does not 1-for-1 recall everything it got trained on. But it does get represented in the model, and sometimes can turn up later, inaccurate or not. LLMs are also good at analyzing unstructured data. So even if you never told your name, but there are enough tidbits to collect, they can de-anonymize people. I read something about that. I will try to find the link and post it if I can.

I do not think LLMs are 100% bad. They have good uses, valid uses. But an ass ton of risks and drawbacks too! I’m not sure society is ready for it. Or ready for more and more social media being bot posts. And those bots becmoing harder and harder to detect.

It’s possible to run some LLMs locally if you have a good GPU. That helps with SOME, not all, just some of the privacy issues. Doesn’t help with many of the other risks tho.

FineCoatMummy@sh.itjust.works on 14 Jun 16:33 collapse

I read something about that. I will try to find the link and post

Ha! Found it!

Large-scale online deanonymization with LLMs

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms.

[deleted] on 14 Jun 20:41 collapse

.