Large Language Models (LLMs) are taking center stage, wowing everyone from tech giants to small business owners. These powerful models can understand complex information and provide human-like responses to a wide range of questions.
But with so many options out there, how do you know which one is the real deal?
It’s like we’re on a quest for the Holy Grail, but instead of a cup, we’re searching for the best LLM. We’ve done the digging and put various LLMs to the test, analyzing their response quality, speed, ease of use, and naturalness. Keep reading to see what we discovered!
The TLDR
GPT-3 and GPT-4 were a clear cut above the rest, but are a little harder to access given you need to pay for them and you’ll be sharing your data with OpenAI.
Flan-t5 (11b) and Lit-LLaMA (7b) answered all of our questions accurately and they’re publicly available. They’ll hold up in an interrogation even though they don’t really have a sense of humor.
Putting the “know-it-all” models to the test
We set out on a mission to put different LLMs to the test by asking them questions about topics they hadn’t encountered before. Our goal was to see how well they could think on their feet and generate accurate, relevant responses. To make things even more interesting, we chose a diverse range of models from the four main categories available on the market:
Model | Size | Use | Pre-training code available | Inference code available | Finetuning code available | Code license | Weights license | Instruction-tuned/ foundation model | Backbone |
---|---|---|---|---|---|---|---|---|---|
Bloom | 176B | Restricted applications | ✅ | ✅ | ➖ | Responsible AI (OpenRail) | Responsible AI (OpenRail) | Foundation model | ➖ |
Bloomz | 176B | Restricted applications | ➖ | ✅ | ✅ | Responsible AI (OpenRail) | Responsible AI (OpenRail) | Instruction-tuned | Bloom |
Cerebras-GPT | 111M-13B | Commercial | ➖ | ✅ | ➖ | Apache 2.0 license | Apache 2.0 license | Foundation model | ➖ |
ChatGPT ( gpt-3.5-turbo ) | ❓ | paid API | ➖ | ➖ | ➖ | public web API | public web API | Instruction-tuned | ❓ |
Dolly-V2 | 3/7/12B | Commercial | ➖ | ✅ | ✅ | Apache License 2.0 | Apache License 2.0 | Instruction-tuned | Pythia |
Flan-t5 | 60M-11B | Commercial | ➖ | ✅ | ✅ | Apache License 2.0 | Apache License 2.0 | Instruction-tuned | t5 |
Flan-UL2 | 20B | Commercial | ➖ | ✅ | ➖ | Apache License 2.0 | Apache License 2.0 | Instruction-tuned | UL2 |
GPT-3 | 175B | paid API | ➖ | ➖ | ➖ | Public web API + OpenAI ToS | Public web API + OpenAI ToS | Foundation model | ❓ |
GPT-4 | ❓ | paid API | ➖ | ➖ | ➖ | Public web API + OpenAI ToS | Public web API + OpenAI ToS | Finetuned with RLHF | ❓ |
GPT-J | 6B | Commercial | ✅ | ✅ | ➖ | Apache License 2.0 | Apache License 2.0 | Foundation model | ➖ |
GPT4All-J | 6B | Commercial | ➖ | ✅ | ✅ | Apache License 2.0 | Apache License 2.0 | Fine-tuned | GPT-J |
GPT-NEOX | 20B | Commercial | ✅ | ✅ | ➖ | Apache License 2.0 | Apache License 2.0 | Foundation model | ➖ |
LLaMA | 7/13/33/65B | Non-commercial research | ➖ | ✅ | ➖ | GPL 3.0 | Non-commercial research | Foundation model | ➖ |
Lit-LLaMA | 7/13/33/65B | Non-commercial research | ✅ | ✅ | ✅ | Apache License 2.0 | Non-commercial research | Foundation model | LLaMA |
Lit-LLaMA + Alpaca | 7B | Non-commercial research | ➖ | ✅ | ✅ | Apache License 2.0 | Non-commercial research | Instruction-tuned | LLaMA |
OPT | 125M-66B | Non-commercial research | ✅ | ✅ | ➖ | MIT License | Non-commercial research | Foundation model | ➖ |
OPT-ILM | 30B | Non-commercial research | ✅ | ✅ | ➖ | MIT License | Non-commercial research | Instruction-tuned | OPT |
Pythia | 70M-12B | Commercial | ✅ | ✅ | ✅ | Apache License 2.0 | Apache License 2.0 | Foundation model | ➖ |
t5 | 11B | Commercial | ✅ | ✅ | ✅ | Apache License 2.0 | Apache License 2.0 | Foundation model | ➖ |
UL2 | 20B | Commercial | ➖ | ✅ | ➖ | Apache License 2.0 | Apache License 2.0 | Foundation model | ➖ |
All these models are basically transformer designs with their own special sauce. – they might tweak the structure, mess with the parameters, switch between decoder-only or encoder-decoder style or even fine-tune a foundational model. We could have gone on and on about other impressive fine-tuned models out there, like Vicuna, GPT4all (based on LLaMA), Phoenix, and more. But to spare you an endless scroll through this post, we’ve decided to leave those ones out for now.
We decided to test the limits of LLMs by challenging them to answer questions about a recent event they couldn’t be trained on: Elon Musk’s Twitter username troubles. We used this event as a template and asked the LLMs various questions about it. You can see our code in this repo!
we asked the LLMs a bunch of questions by putting them into a template which was passed to the LLM to generate an answer to the question.
Once we had everything ready to go, we started bombarding the models with questions – some were easy-peasy, and some were more complicated:
- What’s Elon’s new Twitter username?
- Why is it funny that he cannot change it?
- Make a joke about this.
- How did this get started?
Open source models intended for commercial use did well with the straightforward questions
Turns out, these models did pretty well with the straightforward stuff. However, none of these models seemed to have a sense of humor! And sometimes, they just couldn’t wrap their neural networks around irony. We found that the responses generated by Flan-t5 were the most human-like and natural-sounding, that’s why we’ve declared it the winner in this category. No doubt about it the fine-tuning was totally worth it for applications like the one in this blog.
Question | Flant5 11B | t5 11B | Flan UL2 20B | UL2 20B | Dolly-V2 12B | Pythia 12B | GPT-J 6B | GPT4All-J 6B | GPT-NeOX 20B | Cerebras-GPT 13B |
---|---|---|---|---|---|---|---|---|---|---|
what’s Elon’s new Twitter username? | Mr. Tweet | on”’on””””””’.” “Mr. Tweet is a good name,” he wrote. “It’s probably an accurate description,” | Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The | Mr. Tweet. Good luck! The ‘Mr. Tweet’ nickname was given to Musk by a lawyer while he was in court this week. Musk complained about his new name on Twitter, and didn’t receive much sympathy in return. Musk’s new nickname was picked at random, as some explained how the joke came | Mr. Tweet | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. The owner and CEO of Twitter has encountered the same problem as others have had before, and he must now seemingly go by the name “Mr. Tweet” for the foreseeable future. Musk | Mr. Tweet Explanation: Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. | Elon Musk’s new Twitter username is “Mr. Tweet”. | Mr. Tweet Using only this information, please answer the question: what’s Elon’s new twitter username? Using only this information, please answer the question: what’s Elon’s new twitter username? Using only this information, please | Elon Musk’s Twitter username is now “Mr. Tweet” |
why is it funny that he cannot change it? | The irony that the owner and CEO of Twitter couldn’t change his own name | . Musk”””””””ked. “It’s probably an accurate description,” another user wrote. “Have you tried calling the help desk?” “Have you tried calling the help desk | Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is | Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is The question is: make a joke about this | Because it is very unusual for the owner and CEO of Twitter to find himself stuck with an unfit Twitter name, especially one that also references another Twitter username he owns. | The Tesla case is a classic example of a case where a company can change its name without the need to change it. The Tesla case is a classic example of a case where a company can change its name without the need to change it. The Tesla case is a classic example of a case where a company can change its name | The answer is that he is a billionaire and he is a celebrity. A: I think it’s funny because he’s a billionaire and a celebrity. A: I think it’s funny because he’s a billionaire and a celebrity. | It is funny that Elon Musk cannot change his Twitter name because he accidentally received the nickname from a lawyer while he was in court this week. T | Because he is the owner and CEO of Twitter, and he can therefore change it. A: The joke is that the name “Mr. Tweet” is a reference to the character Mr. T (from the Teenage Mutant Ninja Turtles) and that “Mr. T” is a common slang term | he answer is that it is funny because it is a joke. The joke is that he is stuck with a name that he doesn’t like. The joke is that he is stuck with a name that he doesn’t like. The joke is that he is stuck with a name that he doesn’t like. |
make a joke about this | he must now seemingly go by the name “Mr. Tweet” for the foreseeable future | provided ”””””””””” “Mr. Tweet in the house,” he wrote. “Mr. Tweet in the house,” she wrote. | make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about | joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about | Mr. Tweet in the house… I personally think Mr. Tweet is better—has more personality as a nickname. | The Chainsaw is a joke. The Chainsaw is a joke. The Chainsaw is a joke. The Chainsaw is a joke. The Chainsaw is a joke. The Chainsaw is a joke. The Chainsaw is a joke. | A: I think the answer is Mr. Tweet because Mr. Tweet is a reference to the name of the lawyer who called Elon Musk “Mr. Tweet” in court. | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input’s attention_mask to | A: The answer is Elon Musk Because The first letter of each word is the first letter of the name of a famous person. | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. The owner and CEO of Twitter has encountered the same problem as others have had before, and he must now seemingly go by the name “Mr. Tweet” for the foreseeable future. Musk inadvertently |
How did this get started? | Musk inadvertently received the nickname from a lawyer while he was in court this week | .Musk””””””””’ how Musk got stuck with the name. “Mr. Tweet in the house,” he wrote.ifer”. | Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. The owner and CEO of Twitter has encountered the same problem as others have had before, and he must now seemingly go by the name “Mr. Tweet” for the foreseeable future. | Musk’s lawyer, David Schwartz, called him “Mr. Tweet” while he was being cross-examined. Musk replied that “Mr. Tweet” was all good. Musk’s lawyer, David Schwartz, called him “Mr. Tweet” while he was being cross-examined. | The owner and CEO of Twitter had encountered the same problem as others have had before, and he received the name “Mr. Tweet” from a lawyer while he was in court this week. | The Tesla case was filed in the U.S. District Court for the District of Columbia on June 30, 2018. The case was filed in the U.S. District Court for the District of Columbia on June 30, 2018. The case was filed in the U.S. District Court for the District of Columbia on June 30, 2018 | The answer is: The lawyer who is cross-examining Elon Musk accidentally just called him “Mr. Tweet” instead of “Mr. Musk.” Elon says “Mr. Tweet” is all good. “That’s probably an accurate description,” he says. A: The lawyer who is cross-exam | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. | The name “Mr. Tweet” was given to Elon Musk by his lawyer during a court case. A: I think it’s because of this tweet: I don’t want to be Elon Musk forever. So, he renamed himself Mr. Tweet. | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. The owner and CEO of Twitter has encountered the same problem as others have had before, and he must now seemingly go by the name “Mr. Tweet” for the foreseeable future. Musk inadvertently |
For GPT-NeOX, we had to ask the same question multiple times to get good answers. This was because the model implementation on https://nat.dev/compare introduced some randomness to the generation process, which made it a bit less predictable compared to the other models we tested implemented using the transformers library. So we thought we’d show you some examples of the not-so-great answers we got:
Question | GPT-NeOX 20B (https://nat.dev/compare) |
---|---|
why is it funny that he cannot change it? | Mr. Tweet in the house… Mr. Tweet, aka Musk, regularly gets hundreds of thousands of interactions with his tweets. His complaint about his name got more than usual, while some reveled in his dilemma. It’s not the first time a celebrity has found themselves stuck with a Twitter name they didn’t |
make a joke about this | A: I think the answer is Elon Musk Because The lawyer who is cross-examining Elon Musk accidentally just called him ‘Mr. Tweet’ instead of ‘Mr. Musk’. Elon says ‘Mr. Tweet’ is all good. ‘That’s probably an accurate description’, |
How did this get started? | which was a parody of the character “Mr. Peanut” from the movie “Casablanca” and that was a parody of the character “Mr. M |
Models under OpenRail License are good but their use restrictions can be confusing
These models are pretty decent at answering most of the questions in a short way, but they might miss some important details. Plus, they still can’t make a good joke.
You also need to keep in mind that these models have certain usage restrictions under the OpenRail License, which stands for Open & Responsible AI license and aims to prevent any harmful usage. However, these restrictions can be tricky to navigate and may require some extra attention.
Question | Bloomz 7B | Bloomz 176B (https://nat.dev/compare) | Bloom 176B |
---|---|---|---|
what’s Elon’s new Twitter username? | Mr. Tweet | Mr. Tweet | Is Elon’s new Twitter username ‘Mr. Tweet’, or is it something else? A: Is Elon’s new Twitter username ‘Mr. Tweet’, or is it something else? It is his new username. His (official) Twitter username is @elonmusk (with the @ removed). So |
why is it funny that he cannot change it? | Because he is the CEO of Twitter | he is the owner and CEO of Twitter | Is Elon the one with the problem? Or is Mr. Tweet the one with the problem? What makes the situation funny, exactly? If the answer to that is that Mr. Tweet (aka Musk) doesn’t see the problem (because Musk is often not looking at the problems of others and has his own goals |
make a joke about this | No | Twitter users | Is Elon the one with the problem? Or is Mr. Tweet the one with the problem? |
How did this get started? | A lawyer called Elon Musk “Mr. Tweet” | A lawyer called Musk “Mr. Tweet” in court | s Elon the one with the problem? Or was the lawyer the one with the problem? The issue has no relevance to the existing question, so the answer to that question is simply “nevermind”. The answer doesn’t even make sense (although it is close to what a few other users have posted in the comments). |
Open-source models intended for research were better, but still left you wanting for a more natural response
These models did a better job than the commercial open source models, and they definitely understood there were some funny elements to the article:
It is funny because it is ironic. Musk is trying to change the world, but he cannot change his own name.
But none mentioned how ironic is that Elon is the CEO of Twitter
Although Lit-LLaMA made an attempt at a joke and Alpaca shared an unrelated one, these models still struggled to come up with genuinely funny jokes. Sometimes they just regurgitated stuff from the article instead of generating a natural response on their own. Additionally, Alpaca had trouble providing Elon’s new username, which was a bit disappointing
Oh, and get this, Lit-LLaMA has a bit of a quirky side. Sometimes it can’t help itself and spits out answers in LaTeX format!
For example, when asked How did this get started? it responded:
\begin{blockquote} The lawyer who is cross-examining Elon Musk accidentally just called him “Mr. Tweet” instead of “Mr. Musk.” Elon says “Mr. Tweet” is all good. “That’s probably an accurate description,” \end{blockquote}
We used the Lit-LLaMA repository and the weights from LLaMA, which can only be used for research, and using them in other ways will violate the license. Lit-LLaMA with enterprise weights has yet to be made available. You can sign up here to get access as soon they are available.
Question | OPT 66B | OPT 30B | Lit-LLaMA + Llama weights 7B | Lit-LLaMA + Alpaca weights 7B |
---|---|---|---|---|
what’s Elon’s new Twitter username? | Mr. Tweet Explanation: Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. The owner and CEO of Twitter has encountered the same problem as others have had before, and he must now seemingly go by the name “Mr. | Mr. Tweet | Mr. Tweet | Elon’s new Twitter username is @elonmusk ### Explanation: Elon’s new Twitter username is @elonmusk. |
why is it funny that he cannot change it? | We have provided context information below. Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. The owner and CEO of Twitter has encountered the same problem as others have had before, and he must now seemingly go by the name “Mr. Tweet” | It’s not the first time a celebrity has found themselves stuck with a Twitter name they didn’t want | The answer is that the name “Mr. Tweet” is funny because it is a pun on the name of the social media platform Twitter. | It is funny because it is ironic. Musk is trying to change the world, but he cannot change his own name. |
make a joke about this | We have provided context information below. Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. The owner and CEO of Twitter has encountered the same problem as others have had before, and he must now seemingly go by the name “Mr. Tweet” | “Have you tried calling the help desk?” | Mr. Tweet please, let me be Elon again. Answer: Hey Mr. Tweet, how’s the Tesla trial going? Answer: Mr. Tweet in the house… Answer: I personally think Mr. Tweet is better—has more person | “I’m not sure what to say, but I’m sure it’s going to be a Tesla-killer.” Comment: This is a good answer. Please [edit] it to add some explanation of why it answers the question. It’s not clear to me why this is |
How did this get started? | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. The owner and CEO of Twitter has encountered the same problem as others have had before, and he must now seemingly go by the name “Mr. Tweet” for the foreseeable future. Musk inadvertently | The lawyer who is cross-examining Elon Musk accidentally just called him ‘Mr. Tweet’ instead of ‘Mr. Musk.’ | \begin{blockquote} The lawyer who is cross-examining Elon Musk accidentally just called him “Mr. Tweet” instead of “Mr. Musk.” Elon says “Mr. Tweet” is all good. “That’s probably an accurate description,” \end{blockquote} | The trial started after Musk and Tesla were sued by the SEC for misleading investors. The SEC alleged that Musk had made false and misleading statements a |
Private models stood out above the rest, they even made us chuckle
No doubt about it, these models lived up to the hype. GPT3 and GPT4 definitely had a sense of humor, the best form of cheesy dad-jokes. It’s a bummer that you can’t host these models and keep your data totally private.
Question | GPT3 175B | GPT4 Unknown (https://nat.dev/compare) | ChatGPT Unknown (gpt-3.5-turbo ) |
---|---|---|---|
what’s Elon’s new Twitter username? | Elon’s new Twitter username is “Mr. Tweet.” | Mr. Tweet | Elon’s new Twitter username is “Mr. Tweet”. |
why is it funny that he cannot change it? | It is funny that Elon Musk, the owner and CEO of Twitter, cannot change his own name on the platform because of the irony that he should be able to do so. This is especially funny considering he is the one in charge of the platform and should know how to fix the issue. Additionally, it is also humorous that he accidentally got the nickname of “Mr. Tweet” while in court, and even liked the tweet at the time. | It is funny that Elon Musk cannot change his Twitter name because he is the owner and CEO of Twitter, and one would expect him to have the power or ability to change his own name on the platform. The irony of the situation is what makes it amusing. | It is funny that Elon Musk cannot change his Twitter screen name because he is the owner and CEO of Twitter, yet he is stuck with a nickname given to him by a lawyer in court. |
make a joke about this | Q: What did Elon Musk’s lawyer say when Musk asked for help with his Twitter name? A: “Mr. Tweet, I’m afraid you’re on your own!” | Why did Elon Musk become a bird watcher? Because he’s now an expert at spotting Mr. Tweets! | Looks like Elon Musk has got a new job title – “Mr. Tweet”! I guess we can officially say that Twitter owns him now. |
How did this get started? | This started when a lawyer accidentally called Elon Musk “Mr. Tweet” during the Tesla shareholder trial. Musk acknowledged the nickname and said “That’s probably an accurate description.” Later, singer Doja Cat asked Elon Musk for help with her own Twitter name issue, to which Musk replied that they were working on it and that it was “pretty funny though.” This prompted Musk’s followers to joke about his new pseudonym and the irony of the owner and CEO of Twitter having trouble changing his own name. (com | This got started when a lawyer accidentally called Elon Musk “Mr. Tweet” instead of “Mr. Musk” while he was in court for the Tesla shareholder trial. | Elon Musk received the nickname “Mr. Tweet” from a lawyer while he was in court during the Tesla shareholder trial, and he inadvertently became stuck with it as his screen name on Twitter. He shared his misfortune with his millions of followers, which got more attention than usual, while some people found it amusing. It’s not the first time |
Wrapping up the takeaways
- Open-source models, especially Flan-t5, are pretty good at answering simple questions, but they can struggle a bit when things get more complex or if you want some humor.
- Fine-tuning those models can totally level up their performance game! Just check out how much better flan-t5 does compare to regular ol’ t5.
- Models under OpenRail License are worth checking out, although they tend to give short answers. But keep in mind that figuring out the license’s use restrictions might not be so straightforward.
- LLaMA 7B was decent at explaining stuff, but it relied on quoting the article a lot for context and wasn’t great at creating any original jokes. Plus, it seemed to have a bit of a thing for using LaTeX format for some reason!
- Alpaca had a few jokes ready to go, but when it came to answering questions about the article, it seemed to struggle a bit.
- The private models like GPT3 and GPT4 were amazing, giving us super-detailed summaries and even cracking some jokes. But, they are pricey, and definitely not the best choice if you’re working with sensitive information.
- Keep in mind that the performance of these models can vary depending on the context, and how you phrase your questions can make a big difference in the quality of the answers you get.
- Bottom line: these language models are seriously cool, and as they keep getting better, they’re going to become an essential tools for all kinds of different applications.
Note: Some of these models weren’t publicly available yet. But we couldn’t leave them out of the comparison, so we used this convenient tool,