This research started as an exploration of predictions and recommendation engines and why they suck. It quickly evolved as I needed to create my own data sets. The one I felt would benefit most from a better recommendation engine was music, so I switched from YouTube to SoundCloud. Through the SoundCloud API I could access track titles, artists, and also things I didn't think would be interesting but turned out very useful — track art, waveform, and reposting chains.
The system is not so much an AI (although it does have some neural networks to transform data) as it is a set of logic checks against my training set, plus a black list. It will automatically repost tracks from my stream based on a series of checks, evaluating: title (recently reposted?), description & tags (against the black list), artist & publishers (weighted from my likes), reposting chain, track art (similarity score), track waveform (intensity + similarity), and an audio sample (Fourier series, speech-to-text, filtered against the black list).
Most processes run in parallel and are non-blocking thanks to a graph database of transient details. Final results feed an AI with two outputs: yes/no to repost and which playlist to add to. If "no" but component values are high, the track ends up in a review playlist for me to look at later.
See the results on SoundCloud.
NOTE: I do not run the AI all the time as it generates a lot of data and the SoundCloud API is throttling me, so the results show up in spurts.
References: IJEDR1404092, arXiv 1703.09109, VU Fernandez.