Social Media Attribution For Artificially Generated Content, Part I/V

OPenAI artificially generated content

Before Anything Else

Continuing with the theme of attribution started in the previous series “Social Media, Copyright, and Attribution, Convergence,” where we looked primarily at attribution for conventional sources of information and media; we now look closer at attribution for artificial intelligence i.e., artificially generated content. Artificially generated content is based on the information technology innovation called artificial intelligence (AI,) but specifically large language models (LLM.) At the time of writing there were nearly three dozen models known to this author, including OpenAI’s ChatGPT 3.5 and ChatGPT 4.0, Google’s PaLM 2 (Pathways Language Model 2 or Bard,) Microsoft’s Copilot, which implements ChatGPT 4.0, and is also integrated into Microsoft Bing, and then Meta’s (Facebook) LLaMA (Large Language Model Meta AI.) Attributing, for example, OpenAI artificially-generated content, is an important and complex issue that is still being debated by experts. There is no one-size-fits-all answer, but there are a few considerations.

What Are LLM’s

First things first, some background, LLMs are artificial neural networks (biologically, a neural network is a network or circuit of neurons, as in the brain; artificial neural networks imitate the biological ones) enabled on computerized systems and are pre-trained via self-supervised and semi-supervised learning. LLM can understand everyday language or pseudocode. This ability is facilitated by the use of immense amounts of data to establish, during training, parameters that define the system and determine (or limit) its performance. Due to the huge dataset and algorithm, they consume extensive computational resources during operations.

Each LLM provider implements his models containing his dataset and algorithm fine-tuned to their preference. The datasets are not equal in size or domain of knowledge despite overlaps. Algorithms are the result of science and some art but there are no governing regulators. Providers can keep their model abreast with the times or they can train them for a period; as ChatGPT 3.5 is trained on available information up to 2021. ChatGPT 4.0 is current. Providers can make their model available by license or they could choose to keep their model proprietary.

For text-based systems, humans interact with LLM by entering pseudocode in a chatbota chat-box dedicated/purposed to interact with the LLMthat can be interpreted and understood by the model; a process known as prompt engineering. The model returns a text-based response, sometimes including other media (image or audio.) The model response is based on the pre-trained material, the enormous but finite dataset of largely, third parties’ intellectual effort,
and could be apropos to the human inquiry or it could be erroneous, a result known as hallucination.

 Next Time

Next time, we will have a brief look at the legal environment in which LLMs operate to enable an improved understanding of the need for attribution of artificially generated content.

–Richard Thomas

Next, Part II/V

Leave a Reply

Your email address will not be published. Required fields are marked *

UPCOMING TRAINING

SHARE TO SOCIAL MEDIA