Highlights/Upcoming events
Visit from ZJU
Posted by dorien on Saturday, 2 November 2024The AMAAI Lab was honoured to receive Prof. Kejun Zhang and Jiaxing Yu from Zhejiang University last week.
It was fascinating to hear about their latest research in multimodal AI, affective computing for music, latest large datasets, and cool image-based music generation interfaces. I am looking forward to future collaboration!
AMAAI: Abhinaba Roy, Ph.D., Renhang Liu, 路通宇, Geeta Puri, Sithumi Kavindya, Jan Melechovsky, Charlotta-Marlena Geist
Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction
Posted by dorien on Wednesday, 30 October 2024Do you listen to music when you are down? Emotion and music are intrinsically connected. Yet we still struggle to model this. Why?
One of the reasons is that we do only have a handful of small datasets, each using a different set of emotion labels. The AMAAI lab set out to overcome this by developing a zero shot alignment method that is able to merge different datasets using LLM embeddings.
Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges
Posted by dorien on Monday, 28 October 2024How come music and emotion are so intrinsically connected, yet, any music emotion prediction model has sub-par performance? We talk about the current challenges in the field and provide a comprehensive list of music-emotion datasets as well as recent models.
We also kept these lists in a community GitHub so they can be kept up to date: https://github.com/AMAAI-Lab/awesome-MER. If we forgot any model or dataset: just do a pull request to add yours!
20 years since my first generative music model!
Posted by dorien on Thursday, 10 October 2024Can't believe it's been 20 years already (!!!) since I wrote my first work on generative music models as my master thesis as commercial engineer @University of Antwerp, supervised by Kenneth Sörensen.
For those interested, it used a Tabu Search algorithm to optimize abc notation format melodies given a ruleset. Coded in Pascal with midi expert function coded in pure hex.
Read thesis here (in Dutch with full source code in appendix).
Congratulations to Dr. Joel Ong for successfully defending his thesis!
Posted by dorien on Wednesday, 2 October 2024I am excited to announce that Dr. Joel Ong has successfully defended his thesis on Modern Portfolio Construction with Advanced Deep Learning Models as part of the AIFi Lab at SUTD. It has been a pleasure to supervise Joel and guide him through his exploration of various multitask architectures and mixture of expert models, all aimed at leveraging deep learning for effective portfolio construction.
Mustango project - congrats on winning the SAIL WAIC Award Top 30
Posted by dorien on Wednesday, 24 July 2024We are pleased to share that Associate Professor Dorien Herremans, Assistant Professor Soujanya Poria, and their groups from ISTD have won the Super AI Leader (SAIL) Top 30 Award issued by the World Artificial Intelligence Conferences (WAIC), for their joint project ‘Mustango: controllable text-to-music’.
PhD positions in the Music, Audio, and AI Lab
Posted by dorien on Thursday, 18 July 2024The Music, Audio, and AI Lab (AMAAI) at SUTD invites applications for a PhD position in the exciting and rapidly evolving field of music and audio artificial intelligence.
The AMAAI lab, led by Prof. Dorien Herremans, is engaged in cutting-edge research at the intersection of music, audio, and artificial intelligence. We are based in tropical Singapore, a hub for AI research and startups, but also a great base to explore South East Asia, where English is the first language. Our PhD students and PostDoc researchers work on projects that explore areas such as:
New dataset: MidiCaps - A Large-scale Dataset of Caption-annotated MIDI Files
Posted by dorien on Thursday, 18 July 2024I am thrilled to share that MidiCaps - A Large-scale Dataset of Caption-annotated MIDI Files, has been accepted at ISMIR Conference. The MidiCaps dataset is a large-scale dataset of 168,385 midi music files with descriptive text captions, and a set of extracted musical features. The captions have been produced through a captioning pipeline incorporating MIR feature extraction and LLM Claude 3 to caption the data from extracted features with an in-context learning task. The framework used to extract the captions is available open source on github.
New survey on how NLP is used in Music Information Retrieval
Posted by dorien on Friday, 1 March 2024I am excited to announce the latest paper by Viet-Toan Le, who was a visiting student at the AMAAI Lab, on 'Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey'. Viet-Toan did an amazing job at collating and nicely presenting over 225 papers in Music Information Retrieval that are inspired by NLP, as well as presenting the latest challenges and steps forward for the field!
Postdoc and RA position in text-to-music project
Posted by dorien on Friday, 9 February 2024I am excited to announce that we have two positions open at the AMAAI Lab in Singapore: a postdoc and research assistant position in generative AI for music models. They will be building on our recent work of the Mustango model (https://github.com/AMAAI-Lab/mustango) and video2music (https://github.com/AMAAI-Lab/Video2Music).
Mustango: Toward Controllable Text-to-Music Generation.
Posted by dorien on Wednesday, 15 November 2023Excited to announce Mustango, a powerful multimodal Model for generating music from textual prompts. Mustango leverages a Latent Diffusion Model conditioned on textual prompts (encoded using Flan-T5) and various musical features. Try the demo! What makes it different from the rest?
-- greater controllability in the music generation.
-- trained on a large dataset generated using ChatGPT and musical manipulations.
-- superior performance over its predecessors as per the experts.
-- open source!
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
Posted by dorien on Tuesday, 7 November 2023We are happy to announce Video2Music, a novel AI-powered multimodal music generation framework called Video2Music. This framework uniquely uses video features as conditioning input to generate matching music using a Transformer architecture. By employing cutting-edge technology, our system aims to provide video creators with a seamless and efficient solution for generating tailor-made background music.
Upcoming talks
Posted by dorien on Thursday, 12 October 2023We are happy to announce two talks on Tuesday 17 October at 2pm at SUTD I3 Lab 1.605
Title : Exploring NLP Methods in Symbolic MIR: Representations and Models
Twitter-based Bitcoin extreme movement predictions with PreBit
Posted by dorien on Friday, 23 June 2023Our paper on PreBit - A multimodal model with Twitter FinBERT embeddings for extreme price movement prediction of Bitcoin just got published in Expert Systems with Applications.
Time-series momentum portfolios with deep multi-task learning
Posted by dorien on Monday, 12 June 2023Congratulations to Joel Ong on publishing our paper on using multi-task deep learning for porfolio construction in Expert Systems with Applications. The paper presents a new way to leverage time series momentum in a deep learning setting. Read a Twitter thread explaining the basics here.
DiffRoll - Music Transcription with Diffusion
Posted by dorien on Monday, 31 October 2022Great work Cheuk Kin Wai on his latest paper on DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Cheuk, K. W., Sawata, R., Uesaka, T., Murata, N., Takahashi, N., Takahashi, S., ... & Mitsufuji, Y. (2022). DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability. arXiv preprint arXiv:2210.05148.
New paper on the emoMV datasets published in Information Fusion
Posted by dorien on Wednesday, 19 October 2022Congratulations to Thao on leading the publication of the EmoMV dataset set for music-video matching based on emotion!
Pham Q-H, Herremans D., Roig G.. 2022. EmoMV: Affective Music-Video Correspondence Learning Datasets for Classification and Retrieval. Information Fusion. DOI: 10.1016/j.inffus.2022.10.002
Paper highlights:
Keynote at AIMC
Posted by dorien on Wednesday, 21 September 2022New paper in Sensors on Single Image Video Prediction with Auto-Regressive GANs
Posted by dorien on Tuesday, 9 August 2022Congrats on my former research assistant Jiahui Huang on his latest paper in Sensors on 'Single Image Video Prediction with Auto-Regressive GANs'. Now we can generate videos of faces with desired emotions!
Huang, Jiahui, Yew Ken Chia, Samson Yu, Kevin Yee, Dennis Küster, Eva G. Krumhuber, Dorien Herremans, and Gemma Roig. "Single Image Video Prediction with Auto-Regressive GANs." Sensors 22, no. 9 (2022): 3533.