LLM Imperfections, Vicariousness, And Vicissitudes

Briefly, the providers of LLMs in their terms of service make the disclaimer that their models are experimental and may produce errors. For example, OpenAI says “…Artificial intelligence and machine learning are rapidly evolving fields of study. We are constantly working to improve our Services to make them more accurate, reliable, safe, and beneficial. Given the probabilistic nature of machine learning, use of our Services may, in some situations, result in Output that does not accurately reflect real people, places, or facts.” Despite the warning, there is great hype and promise surrounding the technology, as a result, the public may have idealistic expectations. Consequently, we will appraise by coup d’oeil LLMs insufficiencies:
Training
· Different LLMs are trained differently and partly by humans, therefore
· LLMs have systematic biases, which can be compounded by their self-training
· Models are not necessarily trained on the most current information, but are
constrained to available/accessible data/information in temporal domains—there is a lag between their current state and current information; for example, ChatGPT 3.5, as noted in part I, is trained primarily on Internet information up to 2021. Again, LLM must be
periodically retrained or fine-tuned at a cost
· The LLM training process may not be cognizant of or ignores copyright law
and rights of copyrighted works, as the EU Parliament has recognized, although the LLM,
e.g., ChatGPT 3.5, during operations/prompting, does recognize copyrighted and
public domain works
· Models can assist in primary research but lean heavily on the results of third-party
researchers; its proper realm, when returning factual information, is “secondary
research/inquiry” or fictionappraise
Parameters
· LLMs parameter size (factors used to make predictions that an AI system acquires from training data,) aside from algorithm sophistication, positively determines model performance, i.e., processing of inputs, language understanding, generation of human-like text, common sense, and accuracy.
· Setting LLMs temperature hyper-parameter, amongst others, determines the model’s
probabilistic response, e.g., its choice of text response
Prompting
· Prompting (prompt engineering) produces different results across models, inquiry (results are stochastic,) and degree of specificity, in a minuscule number of cases the output may not be unique; Note that Bard offers three text drafts per inquiry. Copilot offers four image versions for a text-to-image prompt
· LargeLanguage Model responses need to be corralled and fact-checked, contrarily, some models automatically cite sources, as they can produce hallucinations, unless deliberately generating fiction, as Bard showed:
Reuters was first to point out an error in Google’s advertisement for chatbot Bard, which debuted on Monday, about which satellite
first took pictures of a planet outside the Earth’s solar system…
…In the advertisement, Bard is given the prompt:
“What new discoveries from the James Webb Space Telescope (JWST) can I tell my 9-year old about?” Bard responds with a number of answers, including one suggesting the JWST was used to take the very first pictures of planets outside the Earth’s solar system, or exoplanets. The first pictures of exoplanets were, however, taken by the European Southern Observatory’s Very Large Telescope (VLT) in 2004, as confirmed by NASA.
–Coulter, Martin and Bensinger, Greg. “Alphabet shares dive after Google AI chatbot Bard flubs answer in ad.” Reuters, Feb 8, 2023, https://www.reuters.com/technology/google-ai-chatbot-bard-offers-inaccurate-information-company-ad-2023-02-08/.
Search Engines
· Search engines, e.g., Google, have ongoing efforts to identify webpages with artificially generated content and to demote their search ranking if its content amounts to spam. Google Search’s guidance about AI-generated content states “Appropriate use of AI or automation is not against our guidelines. This means that it is not used to generate content primarily to manipulate search rankings, which is against our spam policies.”
LLMs insufficiencies are far reaching. So, the consumer of LLMs output must be wary to the technology’s limitations and quirks, especially its vicariousness, biases, deliriums, temporal domains, spam risk, and questionable use of copyrighted material. Thankfully, some control can be exercised by appropriate prompt engineering (see more on this in the addendum, Part V, Appropriate Prompt Engineering.) Further and perhaps, by ascribing attribution it may be useful to search engines like Google to assess/distinguish artificial content and determine publisher intent. Next time though, we return to the main path in understanding attribution for artificially-generated content: why attribution, when to attribute, and attribution formats.
–Richard Thomas
Previous, Part II/V
Previous, Part IV/V