Researchers are improving audio quality with real-time target sound extraction and complex music mixing. Anyone can now build metaverse applications with the new beta release of NVIDIA Omniverse and Colossal-AI’s new open source solution claims to accelerate AIGC and reduce pretraining cost by 7x! Let’s dive in.
Research Highlights
🕺Researchers from IEEE proposed AlphaPose, a system they claim can perform accurate whole-body pose estimation and tracking jointly while running in realtime. In computer vision, accurate whole-body multi-person pose estimation and tracking continues to be challenging. Whole-body pose estimation, which considers the face, body, hand, and foot, is preferable to traditional body-only pose estimation because it can better capture the subtle movements of people for complex behavior analysis. The researcher claim AlphaPose shows a significant improvement over current state-of-the-art methods in both speed and accuracy of whole-body pose estimation.
🌊 Microsoft researchers recently unveiled what they call the first neural network model to successfully achieve real-time and streaming target sound extraction. Their encoder-decoder architecture called Waveformer includes a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture claims to take advantage of the performance that transformer-based architectures offer while processing large receptive fields computationally effectively using dilated causal convolutions. Waveformer utilizes the TorchMetrics library from Lighting for model evaluation.
🎶 An end-to-end system for transferring the mixing style of a reference song from an input multitrack was proposed by Sony researchers. They assert that they have done this by using an encoder that has been specially trained to only extract information from a reference music recording that is related to audio effects. The researchers claim that all of their models were developed using a self-supervised learning approach from a wet multitrack dataset that had already undergone effective data preprocessing, thereby avoiding the data shortage associated with obtaining raw dry data.
ML Engineering Highlights
🖼️ Jay Hack, former CEO of Mira AI, released a text-to-figma generator that has received ample interest since it’s announcement this week. Text-to-figma leverages GPT-3 and Dall-E to manipulate the Figma canvas using only use natural language commands. The generator was trained on examples of Figma components with accompanying descriptions. The makers incorporated a “DSL” (domain-specific language) that had to be expressive, compact, able to be spoken by GPT-3 and compiled back and forth into Figma components.
🎨 For developers, creators, and beginners looking to build metaverse applications, the new beta release of NVIDIA Omniverse is now available and includes significant updates to core reference applications and tools. This new beta release concentrates on maximizing the simplicity of ingesting large, complex scenes from various third-party applications, as well as maximizing real-time rendering, path tracing, and physics simulation. It is powered by support for new NVIDIA Ada Generation GPUs and advancements in NVIDIA simulation technology.
Open Source Highlights
🎨 A fully open-source Stable Diffusion pretraining and fine-tuning solution from Colossal-AI claims to speed up the processes while cutting the cost of pretraining and fine-tuning by 6.5 and 7 times. Additionally, the fine-tuning task flow is easily accomplished on an RTX 2070/3050 PC, making AIGC models like Stable Diffusion accessible to those without access to extremely complex equipment. Check out the Colossal-AI GitHub Repo to learn more.
❄️ The general release of Snowpark for Python, a solution that integrates Anaconda’s data and machine learning packages within Snowflake’s Data Cloud, was announced by Snowflake and Anaconda. This new native integration, which has been available in public preview since June, is for the Python community of data scientists, engineers, developers, and analysts who want to create data pipelines and machine learning workflows right inside Snowflake.
Tutorial of the Week
Data can be a crucial bottleneck for ML projects – whether it’s collecting, labeling, or preparing that data. You can solve this problem by generating synthetic data that’s ready to use in your next project. Jump into the NVIDIA Omniverse Replicator Lightning App to get started generating synthetic data today!
Community Spotlight
Want your work featured? Contact us on Slack or email us at [email protected]
⚡ xG, or Expected Goals, is a familiar statistic for anyone who follows football. This set of tutorials explores training a deep learning model to predict xG, covering the steps involved which include data preparation as well as model training and evaluation. This set of tutorials leverages Lightning for its deep learning model section.
⚡ This repo by Jiawang Bian is a Lightning implementation of SC-Depth (V1-3) for self-supervised learning of monocular depth from video. Predicting object depth from video footage can be difficult for a number of reasons, including things like the rotation between frames and poor accuracy in dynamic scenes. This project seeks to render that process more efficient.
⚡Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time. Getting decent inference speed with large models can pose a significant challenge – this project seeks to “reduce the latency of models integrating transformers as subcomponents.” (It’s worth checking out this repo even just for the incredible imagery!)
Lightning AI Highlights
⚡Notice anything new on the Lightning website? Perhaps our spiffy new blog? We’re excited to share this revamped way to access our most-used resources that include tutorials, informative articles, community discussions, and announcements about the Lightning framework. Check it out!
⚡Attending NeurIPS this year? Interested in making the transition from academia to industry? Stop by our social, hosted by two of Lightning’s very own team members with experience making this transition. Learn more here.
⚡ New to the world of neural networks? Check out and download this Jupyter Notebook based on StatQuest with Josh Starmer’s video about coding neural networks with PyTorch and Lightning. TRIPLE BAM!!!
Don’t Miss the Submission Deadline
- CVPR 2023: The IEEE/CVF Conference on Computer Vision and Pattern Recognition. Jun 18-22, 2023. (Vancouver, Canada). Paper Submission Deadline: Fri Nov 11 2022 23:59:59 GMT-0800
Upcoming Conferences
- IROS 2022: International Conference on Intelligent Robots and Systems. Oct 23-27, 2022 (Kyoto, Japan)
- NeurIPS | 2022: Thirty-sixth Conference on Neural Information Processing Systems. Nov 28 – Dec 9. (New Orleans, Louisiana)
- PyTorch Conference: Brings together leading academics, researchers and developers from the Machine Learning community to learn more about software releases on PyTorch. Dec 2, 2022 (New Orleans, Louisiana)