Deep Learning Program Simplifies Your Drawings | Two Minute Papers #107

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. First, let’s talk about the raster and vector
graphics. What do these terms mean exactly? A raster image is a grid made up of pixels,
and for each of these pixels, and for each of the pixels, we specify a color. That’s all there is in an image – it is nothing
but a collection of pixels. All photographs on your phone, and generally
most images you encounter are raster images. It is easy to see that the quality of such
images greatly depends on the resolution of this grid – of course, the more grid points,
the finer the grid is, the more details we can see. However, in return, if we disregard compression
techniques, the file size grows proportionally to the number of pixels, and if we zoom in
too close, we shall witness these classic staircase effects that we like to call aliasing. However, if we are designing a website, or
a logo for a company, which should look sharp on all possible devices and zoom levels, vector
graphics is a useful alternative. Vector images are inherently different from
raster images, as the base elements of the image are not pixels, but vectors and control
points. The difference is like storing the shape of
a circle on a lot of pixels point by point, which would be a raster image, or just saying
that I want a circle on these coorindates with a given radius. And as you can see in this example, the point
of this is to have razor sharp images at higher zoom levels as well. Unless we go too crazy with fine details,
file sizes are also often remarkably small for vector images, because we’re not storing
the colors of millions of pixels. We are only storing shapes. If we want to sound a bit more journalistic
we can kind of say that vector images have infinite resolution. We can zoom in as much as we wish, and we
won’t lose any detail during this process. Vectorization is the process where we try
to convert a raster image to a vector image. Some also like to call this process image
tracing. The immediate question arises – why are we
not using vector graphics everywhere? Well, one, the smoother the color transitions
and the more detail we have in our images, the quicker the advantage of vectorization
evaporates. And two, also note that this procedure is
not trivial and we are also often at the mercy of the vectorization algorithm in terms of
output quality. It is often unclear in advance whether it
will work well on a given input. So now we know everything we need to know
to be able to understand and appreciate this amazing piece of work. The input is a rough sketch, that is a raster
image, and the output is a simplified, cleaned-up and vectorized version of it. We’re not only doing vectorization, but simplification
as well. This is a game changer, because this way,
we can lean on the additional knowledge that these input raster images are sketches, hand-drawn
images, therefore there is a lot of extra fluff in them that would be undesirable to
retain in the vectorized output, therefore the name, sketch simplification. In each of these cases, it is absolute insanity
how well it works. Just look at these results! The next question is obviously, how does this
wizardry happen? It happens by using a classic deep learning
technique, a convolutional neural network, of course, that was trained on a large number
of input and output pairs. However, this is no ordinary convolutional
neural network! This particular variant differs from the standard
well-known architecture as it is augmented with a series of upsampling convolution steps. Intuitively, the algorithm learns a sparse
and concise representation of these input sketches, this means that it focuses on the
most defining features and throws away all the unneeded fluff. And the upsampling convolution steps make
it able to not only understand, but synthesize new, simplified, and high-resolution images
that we can easily vectorize using standard algorithms. It is fully automatic and requires no user
intervention. In case you are scratching your head about
these convolutions, we have had plenty of discussions about this peculiar term before,
I have linked the appropriate episodes in the video description box. I think you’ll find them a lot of fun – in
one of them, I pulled out a guitar and added reverberation to it using convolution. It is clear that there is a ton of untapped
potential in using different convolution variations in deep neural networks. We’ve seen in a DeepMind paper earlier that
used dilated convolutions for state of the art speech synthesis, that is a novel convolution
variant and this piece of work is no exception either. There is also a cool online demo of this technique
that anyone can try. Make sure to post your results in the comments
section! We’d love to have a look at your findings. Also, have a look at this Two Minute Papers
fan art. A nice little logo one of our kind Fellow
Scholars sent in. It’s really great you see that you’ve taken
your time to help out the series, that’s very kind of you. Thank you! Thanks for watching, and for your generous
support, and I’ll see you next time!

80 Replies to “Deep Learning Program Simplifies Your Drawings | Two Minute Papers #107”

  1. One Idea that I had floating around in my head for a while would be to convert an "evolution simulator" using neural networks into a MMOG. Gameplay would be similar to, and you would play as a small single-celled organism trying to survive and get bigger. You would try to kill and eat other players and you would split at regular intervals. When you die, you would respawn as one of these split cells, but slightly mutated. The creatures would naturally evolve and become better over time. There are obviously a lot of problems with this idea (your success being mainly dependent on luck, for example) but I think it would be a very interesting experiment to say the least 😀

  2. Really neat!
    Though I assume pure line art is a MUCH simpler case than full color, even just filled grey-scale images. Curious how well a similar technique might work for that.

  3. Question: how does an NN output vector data, given that vector data doesn't have a fixed number of data points from drawing to drawing, and NNs have a fixed number of outputs? E.g. With vector graphics a drawing of a square had 4 points of data whereas a drawing of a triangle would have 3. With raster, assuming the images are the same dimensions, it would take the same amount of data for any shape – one per pixel.

    Convnets take a fixed number of inputs and give a fixed number of outputs, but can be madr to work on arbitrarily sized images by dividing the image into overlapping segments of the required size. I can't imagine a similar technique that would work with data without structure, lije vector data.

  4. What amazes me is at 4:22 in the top two row your algorithm nicely kept the eyebrows thick but nothing else. Compared to Adobe's where everything is simply as thick as it was in the original sketch, yours is amazing

  5. Nice Video :), iam wating for a deep learning network that can draw <3, here are some of mines

  6. as an undergraduate that has a true passion for physics and ANN's

    which programming language should i start learning in depth to get next level "stuff"
    is python or c better for a beginner lime me
    consider i done anything in block oriented (scratch programming) and wanna move on.
    suggestions are much appreciated

  7. You didn't mention how the labels were generated. That leads to the question what this technique is actually used for. I'm guessing to speed up anime drawings or increase quality of the drawings.

  8. one thing i thought would be cool is too use NN's to add "reverb" to a drawing. so, sketch + reverb = maybe a more blurry sketch. or add distortion to the sketch causes rippley wavey pen stokes. basically add audio DSP effects to the visual medium via convnets

  9. I just tested your model with a simple picture I drew using paint. The result is amazing, I can't believe how well it works. Great job!

  10. Interesting but even on the weakest setting it still seemed to loose too many of the important line strokes. Tested it on some of my sketches. Perhaps image size is optimal at a certain resolution for ideal computing?

  11. It's as if 10,000 comic book inkers cried out all at once, and suddenly applied for unemployment.

    RIP Banky.

  12. Hey Karoly can't get enough of your site and hope you manage to reach dozens of papers a week. You mentioned you did some of simulations yourself what's your setup?

  13. 4:12 The lines on the forehead of the leftmost mask output. Not perfect, sadly. Still, everything else is pretty damn good.

  14. I tested 3 pics. My short summary: Strong at weak sketches. Surprisingly: Big, black filled space was converted to shapes. For me, that's not too useful. What would be useful, would be, to automatically close shapes, which are nearly closed, to make it easy to fill them with colors.

  15. Imagine using Deep Learning for a compression algorithm. You take a big png picture, convert it into a really flawed format. That means you lost a lot of data and information. Then you send it via whatsapp and finally the network reproduce the picture into the losless format while filling the lost information. :O

  16. the "simplification" makes the pieces war worse, since they hame some very subtle detail that makes sense to us, but looks like mess to the AI and basically erases it

  17. You can't really call vector images infinite "resolution", they're still limited by the number of bits used to store the coordinates. Past a certain level of zoom you can still have smooth curves and lines but you're no longer able to have discrete features.

    Some applications use 64 bit which is enough to zoom from solar system level to seeing features a cm across, if it's 16 bit however like a lot of web browsers or DoInk animation that's 65,536‬ coordinates on each axis. Still insane by bitmap standards but you can still run in to it's limitations very quickly if you're not careful.

  18. I know it is old but I need to ask this:
    Lets think of the rasterized image or the sketch as a 3D model en STEL or STEP format. Then the simplified sketch in .svg would be CAD model of the previous STEP file.
    Could AI learn to create a clean CAD model from a 3D clay model using the same training techniques?

    Great channel!

  19. All they are trying to do is killing jobs of all those foreign low wage douga artists who clean up the frames of animation. But will this also have features to create the colortrace lines (red and blue lines for shadow and light) in separate layers? Will Celsys going to implement this thing in traceman or the krista?

  20. One thing saddens me so much is cg industry still haven’t switched to NURBS. It’s similar to vector. The current polygonal modeling starts to show ugly polygonal angles after zooming in. Where NURBS is always smooth curves. Anything can be modeled with NURBS. And even potatoes will be able to render high res organic models. Only one thing keeping us back is the software tools. No one is developing good tools for nurbs based cg art. All we have some buggy unintuitive cad softwares which can’t be used for actual media productions. We need a software that can blend into current workflow with zero compromises. Which will have advance features like advance texture mapping, simulations, rigging , dynamics etc.

  21. Why not just use a PCA and create a mean image?

    I know the different line strokes arent technically different data points but if we just try to do a dimensionality reduction it should produce the same effect right?

  22. When I saw that the paper was using anime waifus as examples I thought something was amiss. Then I saw that this was published by University of Tokyo.

  23. Hmm. Maybe we could make games which are not graphically (and computationally) demanding at all and use neural networks to make it look better. A mix of convolutional network and autoencoder. E.g. make a Minecraft type game engine and use neural networks to make it look better. Even adjust the hit boxes in fps games etc.

  24. Great work overall but Fyi @2:34 it's ah.go.rthym ah.prec.e.ate ((app =>algorithm vs rythm time on!in TIME . One capitalization [email protected] perspective predictive ((*. =>.<==*.* vs&| D:

    Aka ))face to face (( topiary garden bees to Rose's roses to Disney prince)(ss mutant/¿.? superhero? 0 syndrome (incredibles antagonist into Syndrome

    Basically Nature has (all)ready solved this equation so many times (how do any MAMMAL × INSECTS × PLANTS via so many OBVIOUS (LIGHT LOOKING ON LIGHT!methods (see venn bell special gear required)

    @3:01 the simplification algorithm is used at day 12fps to 32fps by jumping spiders cockroaches to alligators in full life cycle from target (prey mate predator obstacle detect avoid action in action × ANY sense is going on to what is next move bite …sw((all)ow)

    In highest human tiers it's a continual thinking feeling voxel ecology "what's most interesting relevant to be/do what's :(see Disney (multi plane animation) MEETS PIXAR miyazaki ontology

  25. I have a feeling AI technology is really going to either simplify our work or make it obsolete. Except for AI programmers. Fuck.

  26. The results look like shit to me. Sure, the sources aren't great to start with, but the algo is turning things into blobs that weren't blobs.

Leave a Reply

Your email address will not be published. Required fields are marked *