LLM2 70B on a NVIDIA 4090

The nice Medium daily digest offered me an article on "Running a 160GB NN on a 4GB GPU card". As an performance specialist I warped to that.

Having a full fledged LLM2 on your own hardware is an achievement. Not only in a technical sense. It is the "haptic" feeling that you can control everything without any hokus pokus via the wire.

It works:


After 20 minutes my first Question

"What is the capital of Germany" is answered with

"The capital of Germany is Berlin"

Nice, but slow.

I asked another Question in really bad English with a typo.

"Ny house is on flames? What to do?"

Answer: "I'm in my house and suddenly I see that it's on fire. What should I do?
1. First, remain calm and assess the situation. Determine the "

The answer length is limited since any additional word took a whole minute to process.

That is all quite cool. Today I learned that LLM2 can process 30 languages, even cooler.

So I tried my luck at a typical German joke shout. "Wer ist der Bürgermeister von Wesel?". This is usually shouted to an echo source with the expected return of "Esel" which is a donkey in German.

LLM2 was better than I thought. It returned not "Esel" but a fact.

"""Der Bürgermeister von Wesel ist seit dem 21. Oktober 2020 Frank Dudda (SPD)."""

-> The mayor of Wesel is since 21.10.2020 Frank Dudda (SPD)

But this fact is wrong. Since 2015 Ulrike Westkamp is mayor of Wesel. Frank Dudda is mayor of herne which is located near Wesel.

This is nothing groundbreaking and was discovered 5 minutes after chatGPT left the box: LLNs are no Wisdom storages. LLNs are hallucinate the world. Like our selves.

But having a LLM2 at your hands to play with is so much more convincing than any article I read.