Highlights/Upcoming events

Mustango: Toward Controllable Text-to-Music Generation.

Excited to announce Mustango, a powerful multimodal Model for generating music from textual prompts. Mustango leverages a Latent Diffusion Model conditioned on textual prompts (encoded using Flan-T5) and various musical features. Try the demo! What makes it different from the rest?
-- greater controllability in the music generation.
-- trained on a large dataset generated using ChatGPT and musical manipulations.
-- superior performance over its predecessors as per the experts.
-- open source!

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

We are happy to announce Video2Music, a novel AI-powered multimodal music generation framework called Video2Music. This framework uniquely uses video features as conditioning input to generate matching music using a Transformer architecture. By employing cutting-edge technology, our system aims to provide video creators with a seamless and efficient solution for generating tailor-made background music.

Live demo on Replicate.
View on github.

Time-series momentum portfolios with deep multi-task learning

Congratulations to Joel Ong on publishing our paper on using multi-task deep learning for porfolio construction in Expert Systems with Applications. The paper presents a new way to leverage time series momentum in a deep learning setting. Read a Twitter thread explaining the basics here.

DiffRoll - Music Transcription with Diffusion

Great work Cheuk Kin Wai on his latest paper on DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

Cheuk, K. W., Sawata, R., Uesaka, T., Murata, N., Takahashi, N., Takahashi, S., ... & Mitsufuji, Y. (2022). DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability. arXiv preprint arXiv:2210.05148.

Demo & Source code available here.

New paper on the emoMV datasets published in Information Fusion

Congratulations to Thao on leading the publication of the EmoMV dataset set for music-video matching based on emotion!

Pham Q-H, Herremans D., Roig G.. 2022. EmoMV: Affective Music-Video Correspondence Learning Datasets for Classification and Retrieval. Information Fusion. DOI: 10.1016/j.inffus.2022.10.002

Paper highlights:

Keynote at AIMC

I was honoured to give a keynote talk at the 3rd Conference on AI Music Creativity (AIMC) on controllable music generation with emotion. Watch the full keynote here:

New paper in Sensors on Single Image Video Prediction with Auto-Regressive GANs

Congrats on my former research assistant Jiahui Huang on his latest paper in Sensors on 'Single Image Video Prediction with Auto-Regressive GANs'. Now we can generate videos of faces with desired emotions!

Full paper available here.

Huang, Jiahui, Yew Ken Chia, Samson Yu, Kevin Yee, Dennis Küster, Eva G. Krumhuber, Dorien Herremans, and Gemma Roig. "Single Image Video Prediction with Auto-Regressive GANs." Sensors 22, no. 9 (2022): 3533.

Seminar on music and AI at KTH

It was an honour today to be part of the seminar at the KTH Royal Institute of Technology in Stockholm as part of the dialogues series.

dialogues1: probing the future of creative technology
Subject: “Interaction with generative music frameworks”

Guests: Dorien Herremans and Kıvanç Tatar (Video link to be posted)

Dorien Herremans: Controllable deep music generation with emotion

AI and you - podcast

Excited to be featured on the latest 'AI and You - What is AI? How will it affect your life, your work, and your world?' podcast by Peter Scott from Human Cusp.

We're focusing on AI in music: What's the state of the art in AI music composition, how can human composers use it to their advantage, and what is the AI Song Contest? How do musical AIs surprise their creators and how are they like your grandmother trying to explain death metal?

ReconVAT presented in ACM Multimedia

Congrats to Kin Wa Cheuk for his published paper in the ACM Multimedia conference (A*) on 'ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data'. If you are interested in training low-data music transcription models with semi-supervised learning, check out the full paper here, or access the preprint.

Watch Raven's talk here:

Paper published on our game for climate change (PEAR) in Sustainability

Over the last few years, we developed Project PEAR at SUTD Game Lab. Project PEAR is a geolocation based augmented reality game that is aimed at educating the player on climate change as well as influence their behaviours. We just published a study in Sustainability on the effectiveness of this game.

aiSTROM -- A roadmap for developing a successful AI strategy published in IEEE Access

Leading countless AI projects has left me very aware of all the challenges we may encounter during the development process. Therefore, I created a roadmap for AI managers and consultants to follow when creating an AI strategy, so they can better navigate the road to a successful AI strategy. The aiSTROM roadmap was just published in IEEE AccessRead the full article here.


Research assistant / postdoc jobs in Music/Audio and AI

Our team at Singapore University of Technology and Design (SUTD) is looking for an RA or postdoc in music and AI. You will be joining our AMAAI Lab in music/audio/vision AI supervised by Prof. Dorien Herremans. At our lab, we aim to advance the state-of-the-art in AI for music and audio. More information on the music/audio team here. We have multiple research lines going that need your expertise, either in symbolic music (midi) as well as audio.