This week, the announcement of several text-to image, voice and video generation models are delighting researchers and creatives alike. ICLR makes their paper submissions for 2023 public, Microsoft warns of North Korean hackers, the White House unveils the first “AI Bill of Rights” and a bipedal robot sets a Guinness World Record. Let’s dive in!
ICLR 2023 Submission Highlights
The submissions for the 11th International Conference on Learning Representations (ICLR 2023) have been released and January 20, 2023 marked as their final date of decision. Here are a few of the submissions piquing our interest so far:
DREAMFUSION: TEXT-TO-3D USING 2D DIFFUSION: Text-to-3D synthesis utilizing a pretrained 2D text-to-image diffusion model without the need for large-scale datasets of labeled 3D data.
QUANTUM REINFORCEMENT LEARNING: A proposal to combine quantum theory and reinforcement learning (RL) to create an algorithm that claims to greatly aid the fields of finance, industrial simulation, mechanical control, quantum communication, and quantum circuit optimization.
PHENAKI: VARIABLE LENGTH VIDEO GENERATION FROM OPEN DOMAIN TEXTUAL DESCRIPTIONS: A model for generating videos that can be multiple minutes long from text prompts that can change over time.
Research Highlights
📹 Meta AI introduced Make-A-Video, an AI system that generates videos from text. Unlike existing models, Make-A-Video claims to negate the need to learn visual and multimodal representations from scratch, doesn’t require paired text-video data and the videos it generates inherit the expansiveness (diversity in aesthetic, fantastical depictions, etc.) of modern image generation models. Make-A-Video builds on Meta AI’s recent advancements in generative technology research and has the potential to open new doors for creators and artists alike.
🎤 Researchers from The Hebrew University of Jerusalem introduced AudioGen for textually guided audio generation. AudioGen produces audio samples based on descriptive text captions and claims to surmount typical text-to-audio difficulties such as being able to distinguish between sounds, rough recording conditions and insufficient text annotations. The model outperforms the evaluated baselines across both objective and subjective metrics and will be available to the public soon.
🖼️ Researchers in China introduced ERNIE-ViL 2.0, a multi-view contrastive learning framework for image-text pre-training. Pre-trained with 29M publicly accessible datasets, ERNIE-ViL 2.0 claims to surpass current Vision-Language Pre-trained (VLP) models that are only able to construct a single view by being able to construct multiple views within each modality to learn the intra-modal correlation for enhancing the single-modal representation. Their pre-trained models are now available for public use.
ML Engineering Highlights
📜 The White House unveiled a blueprint for an “AI Bill of Rights” to outline how the US government, tech companies and citizens should work together to hold AI accountable. The bill seeks to provide practical guidance to government agencies and a call to action for tech companies to build stronger privacy protections for their AI products.
🦿 Cassie, a bipedal robot developed by ML engineers at Oregon State University, just set a new Guinness World Record by running 100 meters in just 24.73 seconds. Operating with no cameras or external sensors, Cassie was trained for the equivalent of a full year in a simulation environment, compressed to a week through a computing technique known as parallelization. “This may be the first bipedal robot to learn to run, but it won’t be the last,” says OSU robotics professor Jonathan Hurst.
👓 Google is using generative adversarial networks (GANs) to allow Google Lens users to more easily translate text that appears in the real world. Currently, any text that’s converted into a different language uses colored blocks to mask bits of the background image. This new release removes these blocks and swaps the text outright to make the translated image look as though it was part of the original asset. The new model seems to point to Google’s plans to further invest in the creation of new AR glasses that would allow people to translate street signs and storefront in the blink of an eye.
Open Source Highlights
🧰 Meta AI released AITemplate (AIT), a unified inference system with separate acceleration back ends for both AMD and NVIDIA GPU hardware. This new set of open source tools have the potential to make it simpler for developers to switch between various underlying chips.
⚠️ Microsoft announced that hackers backed by the North Korean government are weaponizing well-known open source software in an ongoing campaign that has already compromised “numerous” organizations in the media, defense and aerospace, and IT services industries. By posing as job recruiters, ZINC, Microsoft’s name for the threat actor group, is infecting employee work environments by coercing individuals into installing applications with “objectives focused on espionage, data theft, financial gain, and network destruction”.
Tutorial of the Week
💪 Want to enable GPU-accelerated training on Apple Silicon in PyTorch? This tutorial’s got you covered if you want to train models faster with Apple’s M1 or M2 chips.
Community Spotlight
Want your work featured? Contact us on Slack or email us at [email protected]
💡This repo features super resolution algorithms implemented with Lightning, and supports several models, including DDBPN and SRResNet. Super-resolution models are used to intelligently enhance low-resolution images.
🌃Oğuzcan Turan’s Light Side of the Night is a low-light image enhancement library that consists of state-of-the-art deep learning methods. We can’t decide which part of this repo we love more: the ample Star Wars references, or the way this work enhances images captured in low-light environments.
🙏We just closed the last PR resolving typing in the PyTorch portion of Lightning! A big thank you to everyone who contributed to this issue. As always, keep an eye out for issues like these, which are a great way to make your first contribution to open-source projects.
Don’t Miss the Submission Deadline
AISTATS 2023: The 26th International Conference on Artificial Intelligence and Statistics. Spring 2023. (Location TBD). Paper Submission Deadline: Fri Oct 14 2022 04:59:59 GMT-0700
AAMAS 2023: The 22nd International Conference on Autonomous Agents and Multiagent Systems. May 29 – June 2, 2023. (London, UK) Paper Submission Deadline: Sat Oct 29 2022 04:59:59 GMT-0700
CVPR 2023: The IEEE/CVF Conference on Computer Vision and Pattern Recognition. Jun 18-22, 2023. (Vancouver, Canada). Paper Submission Deadline: Fri Nov 11 2022 23:59:59 GMT-0800
Upcoming Conferences
- ICIP 2022: International Conference on Image Processing. International Conference on Image Processing. Oct 16-19, 2022 (Bordeaux, France)
- IROS 2022: International Conference on Intelligent Robots and Systems. Oct 23-27, 2022 (Kyoto, Japan)
- NeurIPS | 2022: Thirty-sixth Conference on Neural Information Processing Systems. Nov 28 – Dec 9. (New Orleans, Louisiana)
- PyTorch Conference: Brings together leading academics, researchers and developers from the Machine Learning community to learn more about software releases on PyTorch. Dec 2, 2022 (New Orleans, Louisiana)
Want to learn more from Lightning AI? “Subscribe” to make sure you don’t miss the latest flashes of inspiration, news, tutorials, educational courses, and other AI-driven resources from around the industry. Thanks for reading!