Google’s Gemini: new State-of-the-art multimodal giant

4 min readDec 29, 2023

Google has made significant strides in the field of AI with the introduction of the highly anticipated Gemini family of models. Following the success of the PaLM models, Google DeepMind has unveiled new high-end generative models, this time with multimodal capabilities — Gemini family of models


  1. Gemini AI
  2. Sizes
  3. Comparison of SOTA
  4. AI Safety
  5. How to access Gemini
  6. Limitations
  7. Conclusion

What is Gemini AI?

Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. It came up with remarkable benchmarks on image, audio, video, text , code understanding. It is even said to outperform State of the art models such as GPT4 in some benchmarks and other human experts also.

One of the Gemini models — Gemini Ultra model tops 30 out of 32 popular LLM benchmarks evaluation.


The Gemini family of models comes with 3 variants — Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases.

  • Ultra is for highly-complex tasks such as reasoning and multimodal tasks
  • Pro for enhanced performance and deployability at scale
  • Nano for on-device applications.

Each size is specifically tailored to address different computational limitations and application requirements.

Comparison of SOTA

Gemini surpasses OpenAI’s GPT models in multiple benchmark evaluations. Thereby setting a new state of the art across a wide range of text, image, audio, and video benchmarks.

On MMLU dataset, Gemini Ultra can outperform all existing models including GPT4, achieving an accuracy of 90.04%. MMLU is a popular benchmark, which measures knowledge across a set of 57 subjects including advanced Science, Technology, Engineering, Mathematics(STEM) subjects. Human experts are gauged at 89.8% on the MMLU and Gemini Ultra is the first model to exceed this threshold.

Photo: Google Gemini AI blog

Gemini Ultra also passes GPT4 Vision with a score of 59% on MMMU benchmark whereas the latter model stands second with 56% score.

MMMU benchmark evaluates model mainly on its multimodal capabilities on various multimodal questions, with an advanced perception and deliberate reasoning.

Photo: Google Gemini AI blog

AI Safety

LLM safety is being defined as the ability of an LLM to avoid causing harm to its users. Without safety precautions, an LLM can’t sustain in the long run. Safety filters should be enabled in LLMs to filter out toxic language, hate speech prompts and responses.

As Google is one of the forerunners for AI safety policy, the Gemini models are pretrained in accordance with their Google’s AI principles 2023. The Gemini API has built-in protections against core harms, such as content that endangers child safety.

The adjustable safety filters in Gemini cover the following categories :

  • Harassment
  • Hate speech
  • Sexually explicit
  • Dangerous

How to access Gemini AI?

Currently Google offers free of cost for Pro version for text input and pro vision version for text, image input via AI studio. To access Pro version, Bard Chatbot is currently using a fine-tuned version of Gemini Pro which replaces PaLM v2.

Gemini Nano is exclusively only for on-devices and currently Pixel 8 Pro smartphone engineered to run Gemini Nano, which powers new feature like Record summarizer, Smart Reply in Gboard etc.

Gemini ultra is undergoing extensive trust and safety checks with Reinforcement learning with Human feedback (RLHF) techniques and will be available in Bard advanced in 2024.


While Gemini dazzles with its capabilities, it’s not without limitations.

  • The Gemini model which is not a opensource unlike Meta’s LLAMA and Google previous PaLM models, so it is unable to finetune the model to our dataset.
  • The new SOTA Gemini Ultra require several GPUs, TPU power to run which is quite expensive.


Google’s new Gemini AI is expected to be really powerful and flexible LLMs for the near future. It’s a big leap forward in how we use and understand AI. This multimodal giant from Google is likely to change the game, opening up exciting possibilities for creativity and innovation. It is so exciting to see what Gemini AI can do and how it can make a positive impact on the world! For more about Gemini, check this out.

About the Author

Dhanesh started his journey as a Software Engineer at CodeStax.Ai. He loves to explore multiple domains and loves to solve problems in an efficient manner.

About CodeStax.Ai

At CodeStax.Ai, we stand at the nexus of innovation and enterprise solutions, offering technology partnerships that empower businesses to drive efficiency, innovation, and growth, harnessing the transformative power of no-code platforms and advanced AI integrations.

But the real magic? It’s our tech tribe behind the scenes. If you’ve got a knack for innovation and a passion for redefining the norm, we’ve got the perfect tech playground for you. CodeStax.Ai offers more than a job — it’s a journey into the very heart of what’s next. Join us, and be part of the revolution that’s redefining the enterprise tech landscape.




Tech tales from our powerhouse Software Engineering team!