Highlights/Upcoming events

Can AI really compose band-quality music - with structure, harmony, and creative control?

That’s the question we set out to explore in our latest work, BandCondiNet, now accepted in Expert Systems with Applications!

Conditional music generation promises more user control, but current systems often struggle with three things:
- low-fidelity input conditions,
- weak structural coherence, and
- poor harmony across instruments.

SonicMaster - all-in-one mastering model

Ever struggled with cleaning up home-recorded music? Issues like weird echoes, distortion, uneven sound, or mastering to bring your music up to production-level quality can be a huge pain to fix — usually needing several different tools and lots of tweaking.

We just released SonicMaster, a model that aims to simplify this process by handling all those common problems in one place. The coolest part? You can control it with simple text instructions ('Make the audio smoother and less distorted.'') or let it automatically restore your audio.

Congratulations Dr. Jan on graduating!

Congratulations Dr. Jan Melechovsky on obtaining your PhD! I’ve had the pleasure to guide Jan through his PhD journey at the AMAAI lab at Singapore University of Technology and Design. The last five years, Jan has explored a number of fascinating (yet connected ; ) topics, ranging from dysarthric speech analysis to text-to-music.

Some highlights:

Tags: 

AMAAI is organizing EAIM workshop at AAAI

Together with the AIM at QMUL, the AMAAI Lab is organizing the First Workshop on Emerging AI Technologies for Music (EAIM 2026) as part of the AAAI conference 2026 in Singapore.

The workshop will bring together researchers, industry leaders, and practitioners working at the intersection of AI and music. We’ll explore how advances in generative models, multimodal learning, personalization, explainability, and human–AI collaboration are shaping the future of music creation, analysis, and interaction.

Royalties in the age of AI: paying artists for AI-generated songs

As we celebrate World IP Day tomorrow, it’s a fitting time to reflect on how generative AI is transforming music creation, producing impressibly polished tracks in seconds, while also raising vital questions about fairly compensating artists whose work is used to train these models. I explore this topic in my article featured in the World Intellectual Property Organization – WIPO Magazine’s' special issue on Music and IP, 'Royalties in the age of AI: paying artists for AI-generated songs'.

PhD Fellowship Opportunities at SUTD

I am thrilled to announce two PhD fellowship opportunities at the Singapore University of Technology and Design (SUTD) for talented Thai students, Singaporean students, and Singapore Permanent Residents (PR). These fully-funded positions are available in the fields of AI for Finance and AI for Music, hosted by the AIFi Lab (AI for Finance) and AMAAI Lab (Audio, Music, and AI Lab), respectively. We are seeking exceptional candidates with a passion for AI and strong academic backgrounds to join our vibrant research community.

Tags: 

What should I work on next?

"What should I work on next?", is the question we are trying to answer in our latest paper.

The arrival of LLMs and foundational models have significantly changed the field of Music Information Retrieval (ISMIR Conference).

Many of the researchers in the field have had to pivot or adapt to the changing environment and the powerful tools that we now have available. The question many of us are asking is: what topics remain unexplored and are in need of solving?

Tags: 

Text2midi at AAAI

I’m thrilled to introduce text2midi, an end-to-end trained AI model designed to bridge the gap between textual descriptions and MIDI file generation! Our paper has been accepted in the Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), and will be presented in Philadelphia the coming month.

Presenting MidiCaps and MIRFLEX at ISMIR in San Francisco

Exciting news from the AMAAI Lab at this year’s ISMIR conference in San Francisco! We were thrilled to showcase some of our research:

MidiCaps
Presented by Jan Melechovsky and Abinaba Roy, MidiCaps is the first large-scale open midi dataset with text captions. This resource will enable us to develop the very first text-to-midi models (stay tuned -- our lab's model is coming soon!).

Visit from ZJU

The AMAAI Lab was honoured to receive Prof. Kejun Zhang and Jiaxing Yu from Zhejiang University last week.

It was fascinating to hear about their latest research in multimodal AI, affective computing for music, latest large datasets, and cool image-based music generation interfaces. I am looking forward to future collaboration!

AMAAI: Abhinaba Roy, Ph.D., Renhang Liu, 路通宇, Geeta Puri, Sithumi Kavindya, Jan Melechovsky, Charlotta-Marlena Geist

Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction

Do you listen to music when you are down? Emotion and music are intrinsically connected. Yet we still struggle to model this. Why?

One of the reasons is that we do only have a handful of small datasets, each using a different set of emotion labels. The AMAAI lab set out to overcome this by developing a zero shot alignment method that is able to merge different datasets using LLM embeddings.

Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges

How come music and emotion are so intrinsically connected, yet, any music emotion prediction model has sub-par performance? We talk about the current challenges in the field and provide a comprehensive list of music-emotion datasets as well as recent models.

We also kept these lists in a community GitHub so they can be kept up to date: https://github.com/AMAAI-Lab/awesome-MER. If we forgot any model or dataset: just do a pull request to add yours!

20 years since my first generative music model!

Can't believe it's been 20 years already (!!!) since I wrote my first work on generative music models as my master thesis as commercial engineer @University of Antwerp, supervised by Kenneth Sörensen.

For those interested, it used a Tabu Search algorithm to optimize abc notation format melodies given a ruleset. Coded in Pascal with midi expert function coded in pure hex.

Read thesis here (in Dutch with full source code in appendix).

Congratulations to Dr. Joel Ong for successfully defending his thesis!

I am excited to announce that Dr. Joel Ong has successfully defended his thesis on Modern Portfolio Construction with Advanced Deep Learning Models as part of the AIFi Lab at SUTD. It has been a pleasure to supervise Joel and guide him through his exploration of various multitask architectures and mixture of expert models, all aimed at leveraging deep learning for effective portfolio construction.

You can read the full thesis here or below the abstract.

Tags: 

Mustango project - congrats on winning the SAIL WAIC Award Top 30

We are pleased to share that Associate Professor Dorien Herremans, Assistant Professor Soujanya Poria, and their groups from ISTD have won the Super AI Leader (SAIL) Top 30 Award issued by the World Artificial Intelligence Conferences (WAIC), for their joint project ‘Mustango: controllable text-to-music’.

PhD positions in the Music, Audio, and AI Lab

The Music, Audio, and AI Lab (AMAAI) at SUTD invites applications for a PhD position in the exciting and rapidly evolving field of music and audio artificial intelligence.

The AMAAI lab, led by Prof. Dorien Herremans, is engaged in cutting-edge research at the intersection of music, audio, and artificial intelligence. We are based in tropical Singapore, a hub for AI research and startups, but also a great base to explore South East Asia, where English is the first language. Our PhD students and PostDoc researchers work on projects that explore areas such as:

Tags: 

New dataset: MidiCaps - A Large-scale Dataset of Caption-annotated MIDI Files

I am thrilled to share that MidiCaps - A Large-scale Dataset of Caption-annotated MIDI Files, has been accepted at ISMIR Conference. The MidiCaps dataset is a large-scale dataset of 168,385 midi music files with descriptive text captions, and a set of extracted musical features. The captions have been produced through a captioning pipeline incorporating MIR feature extraction and LLM Claude 3 to caption the data from extracted features with an in-context learning task. The framework used to extract the captions is available open source on github.

New survey on how NLP is used in Music Information Retrieval

I am excited to announce the latest paper by Viet-Toan Le, who was a visiting student at the AMAAI Lab, on 'Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey'. Viet-Toan did an amazing job at collating and nicely presenting over 225 papers in Music Information Retrieval that are inspired by NLP, as well as presenting the latest challenges and steps forward for the field!

Pages