Lightning AI Studios: Never set up a local environment again →

← Back to blog

Microsoft’s Kosmos-1 Claims to Solve IQ Tests and OpenAI Introduces ChatGPT and Whisper APIs

Developers can now integrate ChatGPT and Whisper models into their apps and products through OpenAI’s API. Microsoft researchers introduced Kosmos-1, a MLLM that claims to solve IQ tests. One group of researchers are decreasing LLM hallucinations, while another are focused around LLM threat mitigation. YouTube will soon allow creators to take advantage of generative AI and more. Let’s dive in! 

Research Highlights: 

  • Kosmos-1, a Multimodal Large Language Model (MLLM) developed by Microsoft researchers, claims to understand general modalities, learn in context (i.e., few-shot), and obey commands (i.e., zero-shot). The authors used web-scale multimodal corpora to train Kosmos-1 from scratch. These corpora included text and image pairs, image-caption pairs, and text data. Without any gradient updates or finetuning, they tested different options on a range of tasks, including zero-shot, few-shot, and multimodal chain-of-thought prompting. The researchers come to the conclusion that cross-modal transfer, or the transfer of knowledge from multimodal to language and from language to multimodal, can be advantageous for MLLMs.
  • Because the precise internal functionality of LLM is still unknown and implicit, they are open to targeted adversarial prompting. Saarland University researchers demonstrated that adding retrieval and API calling capabilities to LLMs (referred to as ApplicationIntegrated LLMs) may result in a completely new set of attack vectors. Their work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.
  • An LLM-Augmenter system was proposed by Microsoft and Columbia University researchers, which adds a set of plug-and-play modules to a black-box LLM. According to their system, LLMs will produce answers based on consolidated external knowledge, such as that found in task-specific databases. ChatGPT’s hallucinations, according to LLM-AUGMENTER, can be significantly reduced while maintaining fluency and information in its responses. Their models and source code are accessible to the general public.

ML Engineering Highlights

  • The public can now use the ChatGPT and Whisper APIs thanks to OpenAI. Since December, OpenAI has reduced the cost of ChatGPT by 90% through a series of system-wide optimizations, and they are now sharing those savings with API users. Their open-source Whisper large-v2 model is now available to developers via the API- claiming significantly quicker and more affordable results. Users of the ChatGPT API can anticipate ongoing model improvements and the choice of dedicated capacity for greater control over the models.

Tutorial of the Week

The YOLO (You Only Look Once) model series, first introduced in 2015, gained popularity because, unlike earlier architectures, it could perform object detection as a single network by predicting bounding boxes and class probabilities in a single forward pass. The latest in this model series is YOLOv8, and in our latest blog post we show you how to deploy it on the cloud with Lightning.

Community Spotlight

Want your work featured? Contact us on Discord or email us at [email protected]

  • GitHub user turian identified a bug with cloud checkpoints where periodically having CSVLogger write to cloud storage wasn’t working. You can check out the relevant PR fixing this issue here. Nice work, leoleoasd!
  • This PR fixes an issue where some parameters would apparently receive no gradients. GitHub user martenlienen apparently spent a few hours solving this issue, so 😵 💫 thank you for your hard work!
  • GitHub user connesy identified an issue where some behavior related to CUDA devices wasn’t aligning with our documentation. Thanks to GitHub user yhl48 for introducing a fix to this issue!

Lightning AI Highlights

  • We’re on Discord! As the Lightning community has expanded throughout the years, we’ve been looking for the best place to gather and shape the future of AI together. Join us on Discord! We’ll be hosting community events, code-together sessions, and spending time with all of you.
  • Want to get started with contributing to Lightning, but not sure where to start with your first contribution? Have a look at our good first issue label on GitHub, which are GitHub Issues curated by our team that are suitable for making a lightweight but important contribution to our open-source libraries. Looking for inspiration? Have a read through some past PRs as well!

Don’t Miss the Submission Deadline

  • CoLLAs 2023: 2nd conference on lifelong learning agents. (Montreal, Canada). August 23. Submission Deadline: Tue Mar 07 2023 03:59:59 GMT-0800
  • ICCV 2023: International Conference on Computer Vision. Oct 2 – 6, 2023. (Paris, France). 1. Submission deadline: March 8, 2023 23:59 GMT
  • MICCAI 2023: The 26th International Conference on Medical Image Computing and Computer Assisted Intervention. Oct 8 – 12, 2023. (Vancouver, Canada). Submission deadline: Thu Mar 09 2023 23:59:59 GMT-0800
  • KR 2023: 20th International Conference on Principles of Knowledge Representation and Reasoning. Sep 2 – 8, 2023. (Rhodes, Greece). Submission deadline:Wed Mar 15 2023 04:59:59 GMT-0700
  • AutoML-Conf 2023: The international conference on automated machine learning. Sep 12-15, 2023. (Berlin, Germany). Submission deadline: Fri Mar 24 2023 04:59:59 GMT-0700
  • BMVC 2023: 34th annual conference on machine vision, image processing, and pattern recognition. Nov 20 – 24, 2023. (Aberdeen, United Kingdom). Submission deadline: Fri May 12 2023 16:59:59 GMT-0700
  • NeurIPS 2023: 37th conference on Neural Information Processing Systems. Dec 10 – 16, 2023. (New Orleans, Louisiana). Submission deadline: Wed May 17 2023 13:00:00 GMT-0700