Artificial Digital Aesthetics (ADA)
Brains on Art collective
AI application and interactive artwork
ADA is an AI application and an artwork that uses machine learning techniques to produce aesthetic evaluations of pictures taken by its users. The users in turn are prompted to evaluate ADAs claims and preferences, thereby enabling the AI to develop new aesthetic preferences and an understanding of new aesthetic categories. The application follows a client-server model with an Android app based user interface and a separate server backend running the AI itself.
The ADA Android app enables its user to take pictures for ADA to process and allows the user in turn to respond to ADAs evaluations. The user can install the app on their own device. User interaction with the ADA app consists of three main stages: 1) the user takes a picture, 2) ADA responds to the picture by giving a textual aesthetic critique, which contains keywords the user can interact with, 3) by selecting one of the keywords, the user is able to tell ADA whether they agree with it, suggest new qualities and indicate which parts of the picture support these new evaluations, and give their own aesthetic judgements of the picture. In addition to the textual responses, ADA can also optionally express its views orally by leveraging the device’s text-to-speech capabilities.
The ADA backend running on a remote server receives the images sent by the client apps, processes the images to produce aesthetic evaluations, and keeps track of the user interaction on a client by client basis. The image processing is based on a variety of machine vision and machine learning techniques that aim to extract image descriptors, that are then ascribed aesthetic value by ADA. The descriptor extraction mechanisms are shared across the clients and updated offline once enough training data is collected. The aesthetic valuations in contrast are maintained on a client by client basis and evolve online as the user interacts with and slowly personalizes their own ADA.
Category classification is attempted with a neural network adapted from a Kaggle project  and trained on a dataset of approximately 32k artworks provided by the Finnish National Gallery . The artworks in the training dataset were annotated with the category among other pieces of metadata. Given an image, the network outputs a prediction for each category. The category with the highest value is taken to be ADA’s notion and returned to the client, but the highest five categories are stored in the metadata database to be shown as alternatives in case the user disagrees with ADA’s evaluation.
Adjective / dominant color extraction
The dominant colors of the image are extracted by quantizing the colors of the pixels to a small number of colors using an adaptive palette and mapping the resulting hue values (in HSV) to color names.
Initially adjectives (non color words) for the images were selected randomly with the reasoning that users would be prone to correct ‘nonsensical’ adjectives, thus providing ADA with valuable data to learn from. In the second stage we finetuned a pretrained deep neural network (Inception-ResNet-v2 ) to recognize features related to these adjectives, by using user tagged openly licensed images from online image hosting services. The last phase was to use the images provided by the users and their associated adjective corrections to train a network to provide the final adjective classification.
The valence or preference for an image sent to ADA is computed based on the results of the previously described analyses. On a high level the idea is that whereas the analysis of the images’ contents is done independently of the user providing the image, each user (or more specifically each client device) has its own preferences or “taste”. The preferences of an “individual ADA” (client device) are implemented as each category and adjective having a value with range [-1,1]. The total valence of an image is computed by taking the valences of the top adjectives and category of the image and returning the mean of those values. In other words an individual ADA could have a high preference e.g. for the “pink” adjective and “photography” category, and a low preference for “yellow” and “painting” category. Thus an image that’s deemed to be a yellow painting would be appalling to ADA, and a pink photograph would be highly preferable.
 Szegedy, Christian, et al. “Inception-v4, inception-resnet and the impact of residual connections on learning.” Thirty-First AAAI Conference on Artificial Intelligence. 2017.