Work

🪐 The Geniverse

 

The Geniverse is an interactive tool for Prompt Engineering. It allows you to guide several generative models to produce images and videos from text prompts. Sign up to the waitlist to have access to our Alpha release.

 

✏️ ProsePainter

ProsePainter Interface

 
 

ProsePainter combines direct digital painting with real-time guided machine-learning based image optimization. Simply state what you want and draw the region where you want it.

🗿 Sculpting with Words

 

In this work we present an interactive framework for text-guided 3D mesh generation. Our tool integrates the power of large multimodal pre-trained models (i.e. CLIP) and differentiable rendering into a traditional 3D sculpting environment. Our approach for sculpting consists of a user-guided optimization of the values of a 3D Signed Distance Field using gradient descent and a CLIP-based loss.

 

Interface of Sculpting with Words

🖌️ DALLECLIP: Generate Anything from Prompts

“Graffiti Art”

 
 

You can find how I use DALL-E and CLIP to generate anything from a prompt in the following link. I use the recently released decoder from DALL-E to generate the images and CLIP to evaluate their similarity with respect to the prompt. I use this similarity to train the the codebook features that enter the decoder so they end up generating something that maximizes its resemblance with the input sentence.

🎭 StyleCLIP: Generate Faces from Prompts

 

The intersection between language models and computer vision is exciting. I used #CLIP and StyleGAN to create images from prompts. The architecture is extremely simple and the results are quite impressive.

 

“An image with the face of Elon Musk with blonde hair”

🎬 Screenplay Generator using Transformers (WIP)

Screen+Shot+2021-01-14+at+12.46.26+AM.jpg
 

I trained a GPT-2 with more than 1.5 billion parameters using a large dataset of horror movie scripts. The model was trained on a single V100 GPU using techniques such as Mixed Precision, Gradient Accumulation and Gradient Checkpointing. The resulting model can successfully create its own screenplays, including everything that the model has seen on the input data such as dialogs, scene descriptions, camera positions and character expressions.

🧐 GANimation Extended

frida.gif
eric_andre.gif
 

The goal of this work was to implement and study in-depth Generative Adversarial Networks (GANs), a Deep Learning architecture capable of understanding probability distributions of data and generating hyper-realistic samples. This repo contains my implementation of GANimation, a Conditional GAN model capable of producing continuous facial expression modifications on face images in a fully unsupervised way conditioned on scalar vectors obtained using the Facial Action Coding System. In addition, this work proposes a new technique to solve one of the main drawbacks of the original model via a Virtual Cycle Consistency loss.

🔎 StyleGAN Embedding Finder

I implemented a perceptual loss module capable of learning the representation of any input image inside the latent space of StyleGAN, a model trained to generate hyper-realistic faces.

The following images represent the evolution of this model on finding the representation of my own face:

eric_andre.gif
2.png
4.png

The model was not limited to faces, here are some more examples of the generations in the process of finding the embedding of a rose.

11.png

🆘 BridgeAid

 
bridgelogo.png
 
 

Winning project from the MIT COVID-19 Challenge. BridgeAid is a platform that connects individuals who are eager to help with the NGOs that are at the forefront of serving the most vulnerable people in the local communities.

ℹ️ COVID-19 INFOBOT

 
bot.jpg
 
 

During quarantine, I implemented this Telegram chatbot that extracted data from Worldometers and provided information in real time about the situation around the world.