The battle of the chatbots continues with Bard’s reception peaking and falling in a single week while Bing’s new interface is causing a flurry of excitement. Researchers are continuing to push the limits of video diffusion models and we’re showing how you can accelerate serving models with sequential inference steps. Let’s dive in!
Research Highlights:
- A structure and content-guided video diffusion model was presented by researchers from Runway ML. This model edits videos based on either textual or visual descriptions of the desired output. Their model, which was developed in conjunction with videos and images, claims to reveal explicit control of temporal consistency via a cutting-edge guidance technique. They report a wide range of successes in their experiments, including precise control over output characteristics, customization based on a small number of reference images, and a strong user preference for our model’s output.
- Google Researchers presented Dreamix, a diffusion-based method that is claimed to perform text-based motion and appearance editing of general videos. In their method, low-resolution spatio-temporal information from the original video is combined with fresh, high-resolution information that has been synthesized to match the guiding text prompt at the time of inference. In addition, the researchers introduced a new framework for image animation as a part of this effort.
- To mitigate the effects of hallucinated reasoning chains in language models, researchers from Amazon proposed multimodal chain-of-thought reasoning that incorporates vision features in a decoupled training framework. Their framework divides the process of developing justifications and drawing conclusions into two phases in order to produce justifications that are useful for drawing conclusions. They assert that their model outperforms GPT3 and human performance by 16%.
ML Engineering Highlights
- A factual error made by Bard, the conversational bot that Google launched this week as a rival to ChatGPT, just cost Google $100 billion in stock. In a Twitter GIF, Bard is prompted with the question, “What new discoveries from the James Webb Space Telescope can I tell my 9 year old [sic] about?” The chatbot responds with a few bullet points, including the claim that the telescope took the very first pictures of “exoplanets,” or planets outside of earth’s solar system. As NASA has confirmed, this is factual incorrect because no extrasolar planets have been found by the James Webb Telescope.
- Microsoft unveiled Bing’s new conversational search features powered by AI. Microsoft won’t reveal which OpenAI software version is powering Bing, but it’s rumored to be based on GPT-4, a language model that hasn’t yet been made public. Microsoft is also incorporating AI features into its Edge browser. The tools can help with writing emails and social media posts as well as summarizing websites.
- Microsoft and American Express announced a partnership in applying AI to auditing corporate expense reports. Amex claims that the initial solution will automate expense reporting and approvals by utilizing machine learning and AI. The new system is supposed to include a decision engine powered by AI that comprehends the company’s own travel and expense (T&E) policy and how it applies to submitted expenses. It will classify each transaction and assign a risk score based on this knowledge as well as other elements, such as the employee’s past purchases and payments.
Open Source Highlights
- LMOps, a set of tools developed by Microsoft Research for enhancing text prompts used as input by generative AI models, was made available to the public. Promptist, a program that optimizes user text input for text-to-image generation, and Structured Prompting, a method for including more examples in a few-shot learning prompt for text generation, are both included in the toolkit.
- Open source developers should be exempted from the European Union’s (EU) new artificial intelligence (AI) regulations, according to GitHub CEO Thomas Dohmke. “Open source is forming the foundation of AI in Europe,” Dohmke said onstage at the EU Open Source Policy Summit in Brussels. “The U.S. and China don’t have to win it all.”
Tutorial of the Week
Imagine if, when ordering coffee, you had to wait for at least four other people to come in and place their orders before the barista even started on yours. Wild, right? Well, that’s sometimes how batching for diffusion models work. In this tutorial, we show you how to overcome this batching problem and improve inference speed by up to 18%.
Community Spotlight
Want your work featured? Contact us on Discord or email us at [email protected]
- Thank you to the eagle-eyed members of our community who’ve landed PRs in Lightning by making fixes to our docs! You can check their PRs out here, here, and here. Documentation really does take a village.
- This community PR, merged this week, avoids code crashes due to an error where the prediction dataloader was wrapped twice with MpDeviceLoader. Shoutout to GitHub user Liyang90!
- As always, we love to see community-led contributions to Lightning! If you’re looking for some inspiration (and maybe even wanting to land your first PR), give these open PRs a gander: here, here, and here. Not sure where to get started? Our community tag on GitHub is a great place to browse through other community-led PRs, and we’re always hanging out on Discord. Come say hi!
Lightning AI Highlights
- Have you tried using Lightning in your business or research project and run into trouble, or wish that there was a tool out there that didn’t exist? We want to know! Shoot us an email at [email protected] so we can start building together. 🙂
- Did you know that creating a Lightning account gets you $30USD of free credits right away? 👀 Lightning Credits can be used to run ML workflows and pay for cloud compute. Get started here!
Don’t Miss the Submission Deadline
- ICIP 2023: International conference on image processing. (Kuala Lumpur, Malaysia). Submission deadline: Wed Feb 15 2023 23:59:59 GMT-0800
- UAI 2023: International conferences on research related to knowledge representation, learning, and reasoning in the presence of uncertainty. (Pittsburgh, USA). Submission deadline: Sat Feb 18 2023 03:59:59 GMT-0800
- IROS 2023 : International Conference on Intelligent Robots and Systems. Oct 1 – 5, 2023 (Detroit, Michigan). Submission deadline: March 1, 2023
- InterSpeech 2023: International conference on the science and technology of spoken language processing. (Dublin, Ireland). Submission deadline: Thu Mar 02 2023 03:59:59 GMT-0800
- CoLLAs 2023: 2nd conference on lifelong learning agents. (Montreal, Canada). Submission Deadline: Tue Mar 07 2023 03:59:59 GMT-0800
- ICCV 2023: International Conference on Computer Vision. Oct 2 – 6, 2023. (Paris, France). 1. Submission deadline: March 8, 2023 23:59 GMT