Disorder Transform

This is mostly for fun

Note

  1. As the title say this “book” is mostly for fun. In my free time I like to study new and old stuff and sometimes I put it on “paper”, i.e., I write the theory and examples in my computer. In order to be more organized I put some content here so I can find it easily when needed. Maybe something here could be useful for someone else, but it is not the goal.
    So, if you are here, have fun too.

  2. Part of the content here is in Portuguese, but I plan to slowing changing everything to English.

About me

Math, physics, life, universe and everything

Short bio: My name is Junior Antunes Koch. I am the Principal Data Scientist/Machine Learning Researcher at Elo7 (an Etsy company) with 3 years of experience developing AI solutions and training people. Before that I spent 5 years as a Professor of Theoretical Physics and Mathematics. In my 37 years I always had interest in how mathematics could be used to model all kinds of phenomena. Today I spend my free time studying new topics and love to share all this knowledge with everyone. I am very passionate about being able to create automated systems to solve complex problems.

Contact:

👩🏻‍💻 Work experience

Principal Data Scientist/Machine Learning Researcher

Elo7, São Paulo, Brazil – (2019 until now)

I entered the company as a Senior Data Scientist and after a year I became the leader of the data science team. After the decision of creating another data science team with focus on our search system I became the Principal and now I work with both teams, together we research and develop:

  • Learn to Rank models applied to Search Engines (LambdaMART, SOLR and LightGBM).

  • Used Reinforcement Learning techniques to create a self regulating relevance system applied to our search system, which was able to improve the purchase rate by at least 13%.

  • Autoencoder with classification and regression for generation of missing features (PyTorch).

  • Explicability with CNN and LSTM for text data using Integrated Gradients (PyTorch and Captum).

  • Explicability using Shapley Values in contexts of search engines and user intent (SHAP).

  • Research using Shapley Values and Generative Adversarial Networks for the generation of purchase explanation of products (PyTorch, SHAP and GANs).

  • Feature Extraction with Topic Modeling to identify the semantic gap between products supply and user demand at Elo7 (LDA, NMF and SVD).

  • Construction of a tool for data annotation (Flask, AWS S3, Docker and Kubernetes).

  • Construction of a complex pipeline of products classification using techniques such as semantic graphs, BM25 similarity ans Bayesian Optimization (LightGBM, Snorkel, Spark, AWS S3, Docker and Kubernetes).

  • Recommender Systems either with collaborative filtering, content-based or hybrid approaches.

In order to help the machine learning community in Brazil I created a free workshop (sponsored by Elo7) called GAN School. I could teach the theory and implementation of the main techniques involving Generative Adversarial Networks (traditional GAN, Conditional GAN, InfoGan and Cycle GAN). It happened twice and was the first event dedicated to this technique in Brazil.


Full-time Professor

Centro Educacional Católica de Santa Catarina, Jaraguá do Sul/SC, Brazil – (2015 - 2017)

As a physicist with a Ph.D. in Materials Science and Engineering at Universidade Federal de Santa Catarina, I was hired to teach Theoretical Physics and Calculus in engineering undergraduate level. During this period I developed a free program to teach Quantum Mechanics and Special Relativity in graduate level.


Extras

In 2018 I worked as a Post-Doctoral researcher in Quantum Gravity at Universidade do Estado de Santa Catarina. I have been working since 2018 as a volunteer writer and podcaster at Deviante, the biggest Brazilian portal of scientific dissemination.


💻 Technology

Programming languages

  • Python: my primary programming language used every day mainly for data manipulation/modeling and creation of machine learning models. Most used python libraries are: scikit-learn, pytorch, pyro, tensorflow, numpy, scipy, matplotlib, plotly, pandas, dask, pyspark, scikit-tda and shap.

  • Scala: mainly used to create data pipelines (ETLs) with Spark and AWS S3.

  • Javascript: basic knowledge to implement simple functions when using Flask to build applications.

Information Retrieval/Search Engine

We use Apache SOLR as our main search engine for retrieval and we developed our own implementation of learn to rank (outside SOLR) to increase the relevance of items shown to the users.


🗣 Languages

English 🇺🇸: fluent
Portuguese 🇧🇷: native speaker
French 🇫🇷: beginner



📚 Education

  • PhD in Materials Science and Engineering (2009 - 2014)

  • Master’s Degree Materials Science and Engineering (2008, 12 months)

  • Bachelor’s Degree in Physics (2003 - 2007)