Back from IJCNN19 (International Joint Conference on Neural Networks), Budapest, I want to recollect some of the insightful bits of knowledge I have gathered from the big ones in the neural network community. Like Dante Alighieri in his Convivio, I have been catching some of the crumbs that fall from the table of the most influencing Neural Networks researchers, and I want to feed a discussion on this. Read on...
"E io adunque, che non seggio a la beata mensa, ma, fuggito de la pastura del vulgo, a' piedi di coloro che seggiono ricolgo di quello che da loro cade" (Dante Alighieri, Convivio, Trattato Primo)
Image: Luca Signorelli - Opera propria Georges Jansoone (JoJan) Taken on il 30 aprile 2008, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=834493
Deep Learning is a troublemaker, in a sense. While it paved the way for a plethora of engineering applications, speeding up the throughput of researchers and engineers alike in their respective domains, it irritated quite a lot of researchers who were following different routes, fighting hard against mathematical problems without obtaining the results that Deep Learning now yields in a glimpse. Deep Learning as we know it now can also fuel bad research habits and, in general, it is very hard to gather insightful knowledge from deep neural networks.
One of the most intriguing moments of the IJCNN conference was a panel titled "Deep Learning: Hype or Hallelujah?", chaired by Vladimir Cherkassky. The panel was composed of several experts who concisely presented their point of view. At the conclusion there was an excited Q&A time.
The Panels: claims and contents
The panelists had different takes on the topic. Vladimir Cherkassky pointed out some excellent questions for debate. I'll recap them as I have grasped them, in the hope not to misinterpret the author's view:
"Data deluges makes Science obsolete", i.e. we don't need good models anymore, since statistics will know how to handle data. This creates a fracture from, e.g., Karl Popper's view: Science starts from problem, not from empirical data
We are, thus, shifting from "first principle knowledge" (hypothesis --> experiment --> theory) to "data-driven knowledge" (program + data = knowledge). The question is: is this really knowledge? Can we make a sense of it?
No real theoretical change has been made since the 1990s. Cherkassky argues that Deep Learning is more a marketing/umbrella term for a combination of old techniques + new powerful hardware and large datasets.
Deep Learning claims to be biologically-inspired, but this is not true.
I have reflected on these points myself for a long time. I strongly agree with the first two points. Specifically, knowledge, should be regarded as the ability to autonomously generalize. If a neural network learns to predict the fall of an object under gravity, I will, myself, never be able to do that. Newton's law of motion, is probably a far better model for predicting objects falls: it can generalize to any two m and a, it is more accurate, it is easier to compute, it doesn't require a computer, it is symbolic, thus, it can be plugged in other equations to generate new models. The knowledge distilled by the neural network, conversely, is unusable for us. Making a sense of it would be like transplanting the brain of a teacher to the brain of his disciple to transfer knowledge. Our current way to teach is different: we use symbols, associate this with a shared semantics, and manipulate these through language.
Point 3 is more blurry. Another panelist (my memory fails) argued on this. He stated that Deep Learning opens interesting theoretical questions worth investigating and that there are many new developments since the 1990s. One aspect is generalization, and the speaker pointed out that a recent ICML paper showed how the usual training/validation error U-shaped curves can, in reality, decrease after a "interpolation threshold", which is something unexpected in the current theoretical framework. I think the speaker referred to this paper , but I'm guessing.
Point 4 was explained as follows:
Biological (e.g. human) beings learn efficiently with little data (Plato's problem), while Deep networks require very large datasets
Nobody really knows how the brain works, and Deep Learning pioneers were not neuroscientists, they were computer scientists
Deep networks do not understand natural language
Indeed, deep neural networks do not emulate any known biological system. The most common example, convolutional neural nets (CNN) are claimed to be inspired by the visual cortex, but, in practice there is a light connection between a biological visual cortex and the CNN we nowadays handle in computer science. Furthermore we process any kind of data with CNN, not only visual information.
Indeed, there are many research works that are truly biologically-inspired (see e.g. Spiking neural nets). However, these are not contemplated in the Deep Learning bubble and they still get worse results for what concerns classification and regression performances. (This does not mean they are not worth investigating. Quite the opposite.)