Take the data, look at the distributions, find useful features, train the algorithms, validate, run, profit. Well, yes, in theory maybe. In the session, I will talk about experiences, WTFs and lessons learned from a quite sophisticated project to find the truth in a sheer vast of data.
Pavlo tames the data bear. Well, he tries to. But sometimes you eat the bear, and sometimes the bear eats you. Trying to come close enough to the bear, Pavlo has tried a vast variety of approaches, languages, platforms and technologies. Right now, he believes that math, asynchronous events and messages, data flows, functional and reactive programming as well as simply speed and careful distribution let him come close enough to the bear. But Pavlo is still far from being able to really eat the bear - probably just nibbling a paw. At least, the bear is now himself confused and doesn't attack or strike back too hard anymore. But still hard enough to leave some scars. Pavlo speaks from time to time at conferences about these scars. And writes a book occasionally, when the scar story doesn't fit into a single conference talk. Pavlo is data technologist and rubber duck with codecentric AG.
Github: pavlobaron
Twitter: @pavlobaron