The World’s First Robot Lawyer Is Here, Microsoft Hopes ChatGPT Will Make Bing Smarter and Scaling Model Serving in Prod Just Got Easier

Can we we compress large language models for better performance? Can we rethink LLM processing? These are just a few of the questions researchers are asking themselves in the new year. An AI bot might be representing people in traffic court as early as next month, Microsoft hopes ChatGPT can boost Bing’s popularity and we’re inviting you to join us for a hands-on webinar on scaling model serving in production. Let’s dive in!

Research Highlights:

Muse, a text-to-image generation/editing model via Masked Generative Transformers, was released by Google’s research team. The model is claimed to be more efficient than pixel-space diffusion models, such as Imagen and DALL-E 2 thanks to use of discrete tokens and requiring fewer sampling iterations. Other performance claims include the achievement of new SOTA on CC3M, zero-shot mask-free editing and zero-shot inpainting and outpainting.

Researchers from Austria introduced a new pruning method called SparseGPT. Their research claims that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. The biggest LLMs still need expensive hardware to run, the authors hope that by using this method, you might only need 4 GPUs instead of 8 to run your 180B parameter model moving forward.

Researchers proposed a novel approach to LLM post-processing called rethinking with retrieval (RR). According to its authors, RR is a simple method that does not need extra training or fine-tuning and is not constrained by the input length of LLMs. GPT-3 was used to assess the effectiveness of RR for common sense reasoning, temporal reasoning, and tabular reasoning. The researchers claim RR can increase the accuracy of explanations and enhance overall LLM performance.

ML Engineering Highlights

Microsoft intends to incorporate ChatGPT’s machine learning technology into Bing search requests. By doing this, Microsoft hopes to increase Bing’s dependability and give search results and responses that are more human-like. ChatGPT’s accuracy will be key to timing of any rollout.

Axelera AI, a company that specializes in high-performance machine learning, has announced a gumstick-shaped M.2 accelerator based on its Metis artificial intelligence processing unit (AIPU), which it claims can add 214 trillion operations per second (TOPS) to Next Generation Form Factor (NGFF)-compatible devices, as well as a more powerful PCI Express card that provides 856 TOPS of compute.

DoNotPay, the company behind “the world’s first robot lawyer” is rolling out an AI bot will represent a human defendant in court for the first time ever. In February, an artificial intelligence (AI) running on a smartphone will listen to every word said in the courtroom before giving the defendant instructions via an earpiece.

Upcoming Webinar

Serving large models in production with high concurrency and throughput is essential for businesses to respond quickly to users and be available to handle a large number of requests.

Join Lightning’s Neil Bhatt and Sherin Thomas to learn about how we took advantage of Dynamic Batching and Autoscaling to serve Stable Diffusion in production and scaled it to handle over 1K concurrent users.

Community Highlights

This week, we’re highlight some recently merged PRs that came from members of the Lightning community specifically aimed at fixing docs! #16180, #16191, and #16234 are all PRs that make contributions to Lightning’s documentation. Keep up the great work!

This PR, #16173, adds support and a test for custom artifact names in the WandbLogger. Shoutout to Manan Goel on the wandb team — working together like this is what we love most about our vibrant, open-source community. 🙂

A recent Harvard course (CS197: AI Research Experiences) has compiled its materials into a Course Book. In it, you’ll learn cutting-edge development tools which include PyTorch and Lightning. Check it out!

Don’t Miss the Submission Deadline

IJCAI 2023: the 32nd International Joint Conference on Artificial Intelligence. Aug 19-25th 2023. (Cape Town, South Africa) Abstract due: January 11, 2023. Full paper submission deadline: January 18, 2023
ACL 2023: The 61st Annual Meeting of the Association for Computational Linguistics. July 9-14 2023 (Toronto, Canada). Full paper submission deadline: January 20, 2023
ICML 2023: Fortieth International Conference on Machine Learning. Jul 23-29 (Honolulu, Hawaii). Full paper submission deadline: January 26, 2023 08:00 PM UTC
IROS 2023 :International Conference on Intelligent Robots and Systems. Oct 1 – 5, 2023 (Detroit, Michigan). Full paper submission deadline: March 1, 2023
ICCV 2023: International Conference on Computer Vision. Oct 2 – 6, 2023. (Paris, France). 1. Full paper submission deadline: March 8, 2023 23:59 GMT

Research Highlights:

ML Engineering Highlights

Upcoming Webinar

Community Highlights

Don’t Miss the Submission Deadline

More from the Blog

Lightning AI Joins AI Alliance To Advance Open, Safe, Responsible AI

8-bit Quantization with Lightning Fabric

4-Bit Quantization with Lightning Fabric