Monday, September 3, 2018

ODSC - Data Science, AI, ML - Hype, or Reality?

I got a chance to attend ODSC India, held in Bangalore on 31st Aug / 1st Sept. For those who don't know, ODSC is the largest Applied Data Science and AI conference, and it was conducted in India the first time this year.

I was very excited to attend this for couple of reasons:

  • I was attending a conference after a long time (i.e. where I was not speaking). So this was going to be a pure learning and knowing expedition for me.
  • Data Science / AI / ML have become huge buzzwords in the industry now. I had some opinions about it - but that was with limited knowledge / understanding about it. I was hungry to learn some specific of these buzzwords.


Since I was going to travel to Bangalore for ODSC anyway, I also decided to participate in the pre-conference workshop - Advanced Data Analysis, Dashboards And Visualization. I thought it would be interesting to learn about the What, Why and How of the techniques of Data Analysis, Dashboards and visualization - which would help me as I rebuild / extend TTA (Test Trend Analyzer). Though the workshop was good, it focused completely on Tableau as a tool and unfortunately did not meet my objectives / expectations. That said, there is another tool I came across in the conference - KNIME - seems interesting and am going to try it out.

The conference was good though. I attended a lot of sessions and had lot of hallway-conversations with many interesting people. Typical outcome of attending a conference, some sessions I liked better than others, some were amazing, some were mediocre. 

Here is my unstructured assessment of what I now think about what I heard and discussed:
  • Advanced mathematics learnt in colleges has an application in data science. So if children / kids ask why should they study Statistics - here is an answer!
  • Creating data models without Business Context will not work. If it does, you have been lucky :)
  • There are some interesting case studies and success stories of AI & ML. But these are the same success stories around since quite some time. All the other "noise" of AI & ML so far seems a hype so far.
  • There is a lot of value in understanding historical data better. Based on that understanding, there can be opportunities to forecast the future. There is a huge risk of doing this forecasting, IF % of uncertainty is not included as part of it. However, it is very easily ignored.
  • Understanding of Neural Networks, computing, and algorithms is essential to building intelligent solutions for complex problems.
  • It is not sufficient to get better / accurate prediction results. Being able to explain how and why those results are better / same / worse is equally important. In many cases, this would be a regulatory requirement.
  • Data Science is the "art" & "science" of understanding data better. To do this, we need to first cleanse / prep the data, simplify it using various techniques, and learn techniques to visualize the data.
  • There is a "grammar of graphics" and a "grammar of interactive graphics" - which helps in thinking about data visualization.
  • Deploying these AI / ML solutions to production is not a trivial task - mainly due to the fact of high computing and huge volume of data processing required to make it production ready. - This is a huge opportunity for the general Software Development / Testing/ DevOps community to solve problems faced by data scientists / people in the data science / AI / ML domain.
  • With data privacy laws rightly becoming stricter, you need to be careful and use only legally obtained sample datasets for analyzing / training the data models - else there is going to be huge penalties for companies involved. (This is in reference to GDPR, a new law coming up in USA and also India.)
  • Earlier, only PhD holder were qualified folks to work on Data Science. Now-a-days, the trend is to get relevant training to interns, and have them work on these problems, and then get the results validated / explained by the PhD specialists.
  • In a nutshell - Data Science, AI, ML are using specialized types of tools and technologies to solve different problems. People / organizations have been doing these activities before the buzzwords were formed / or got popular.
So, what is my core takeaway from this? 
  • As with any new buzzword, there is interesting work happening in Data Science, AI & ML - but the majority claiming to be in the field are just creating and riding the hype!

That said, I want to do the following:
  • Find opportunities to investigate and understand the Data Science + AI + ML in more detail. 
  • Understand the skills and capabilities required from a software developer + QA role perspective to contribute more effectively in solving these newer problem statements
  • Learn python / R 
  • Experiment with various tools / libraries related to data visualization