Harshwardhan Raghunath Patil

Graduate Student, Indiana University, Bloomiongton

  • President and Director of Public Relations, Data Science Club at IU
  • Google Certified Advanced Data Analytics Professional

About Me

I am a Data Scientist at heart and a software developer by practice. Throughout my career, I have garnered 3+ years of experience in software development and 1 year in Machine Learning.

Currently, I am working as a Research Assistant at NLP Labs, where my responsibilities include developing an application(using Flask, postgreSQL) to create a state-of-the-art database. Additionally, I am collaborating and learning from researchers at the Development and Cognitive Sciences Lab to devise statistical analysis methodologies for multilevel and multimodal analysis(in Python and R). I am also involved in creating Machine Learning Pipelines using cutting-edge transformer models for developing classification models(using Pytorch).

In my free time, I enjoy playing chess, solving Rubik's Cube puzzles, and practicing origami. I am also cultivating a habit of reading self-development books. Currently, I am reading "Think and Grow Rich" by Napoleon Hill.

I firmly believe that the fusion of Data Analysis, Machine Learning, and Software Development, coupled with my people skills, would create a positive and impactful environment at my work place.

My skills

Languages : Python
Database: SQL, NoSQL, MongoDB
Frameworks: Flask, Django, PyTorch, TensorFlow, spaCy, NLTK
Web Frameworks: Node.js, React
ML Algorithms: Regression, Classification, Clustering, Neural Networks, Decision Trees, Random Forests
Statistical Analysis: Descriptive Statistics, Hypothesis Testing, A/B Testing, Probability Theory
Data Visualization: Matplotlib, Seaborn, Tableau, PowerBI
Platforms: Linux, AWS
Miscellaneous: Git, Docker, Kubernetes, CI/CD, Hadoop, Spark

Experience

  • May 2023 - August 2023
    Research Assistant (Data Mining)
    O'Neill School of Public and Environmental Affairs · On-site

    • Working under Professor Alberto Ortega as part of the Faculty Assistance in Data Science (FADS) Program by Luddy School of Informatics.

    • Extracted and processed 5 years' worth of data from the National Directory of Mental Health Treatment Facilities' PDF files into a valuable source of contact information by utilizing Python, Regular Expressions, and web scraping techniques.

  • May 2023 - September 2023
    Research Assistant (Data Analyst)
    Consumer Psychology Lab · Remote

    • Working under the guidance of Professor Minkyung Koo to review and summarize 40+ research papers, focusing on coupon use, discount, and deal proneness, to identify dependent variables, independent variables, mediators, moderators, and main observations.

  • May 2023 - December 2023
    Research Assistant (NLP Engineer)
    NLP lab, Indiana University · On-site

    • Developing a Flask application for the Ellipsis and Elided Elements in Natural Language: The Hoosier Ellipsis Corpus (https://nlp-lab.org/ellipsis/) to create a state-of-the-art database under the guidance of Professor Damir Caver.

  • May 2023 - September 2023
    Research Assistant (Data Analyst)
    IU Developmental Cognitive Neuroscience Lab · On-site

    • Contributing to devising statistical analysis methodologies for multilevel and multimodal analysis of physiological signals data and designing ML pipelines to develop a classification model under the guidance of Professor Bennett Bertenthal.

  • Nov 2020 - July 2022
    System Engineer
    Infosys Limited, Banglore · Full-time · Remote

    • Worked on a data warehouse application, collecting, transforming, and cleaning raw data from different networking infrastructure systems.

    • Utilized Microsoft Azure cloud, Linux servers, and Oracle Database in a Networking Decision and Support Database team.

    • Played an instrumental role in creating, converting, and redesigning PL/SQL packages, Perl scripts, and Data Mappings for 13 projects.

    • Optimized job run time by 60% through modification of extract, load, and transfer job scripts.

    • Suggested and initiated the automation of 3 PVT and 2 monitoring activities as part of the Operational Excellence, which contributed to a 40% boost in the team's ticket resolution rate.

    • Developed strong communication and collaboration skills through effective client and team interactions.

Education

  • August 2022 - Present
    Master of Science (M.S.)

    Computational Sciences (Data Science)

    Indiana University, Bloomington
    Achievements :

    • Luddy Outstanding Service Award


    Involvements :

    • Vice-President and Director of Public Relations, Data Science Club at Indiana University

    • Secretory, IEEE Indiana University Student Branch


    Coursework :

    Statistics, Algorithms, Exploratory Data Analysis, Machine Learning, Deep Learning, Computer Vision, Cloud Computing, Natural Language Processing.

  • August 2016 - March 2020
    Bachelor of Technology (B.Tech.)

    Computer Science and Technology

    Shivaji University, Kolhapur
    Involvements

    • Co-Founder, Code-Space (Programming Club)

    • Head, Student's Training and Placement Committee (2019-2020)

    • Member, Student's Council of DOT (2017-2020)

    • Anchor, cultural show 'Symphony' (editions 2K18, 2K19, 2K20)

    • Member, 'Harit Sena Dal' (2017-2019)

    • Player, DOT-CST's Kabaddi and Kho-Kho Team


    Coursework :

    Computational Mathematics, Data Structures and Algorithms, Operating Systems, Data Communication, Networking Systems, System Programming, Computer Security.

My Projects

  • Exploring Machine Learning Techniques for Sales Forecasting at Walmart Stores
    Technology Domain : Machine Learning, Sales Forecasting
    Tech Stack : Data Analysis Tools, Random Forest, XGBoost, Ensemble Algorithms
    Description :

    • Leveraged data analysis and visualization tools to investigate trends, patterns, and correlations in the historical sales data, holiday events, and store information data.

    • Utilized random forest, XGBoost, and ensemble algorithms to predict weekly sales for 45 Walmart stores.

    Source Code

  • Data-driven Customer Segmentation for Personalized Marketing
    Technology Domain : Data Analysis, Customer Segmentation
    Tech Stack : K-means Clustering, R, ggplot2, plotly
    Description :

    • Conducted customer segmentation project for a grocery store using K-means clustering to segment customers based on their purchase behavior, such as the frequency and amount of their purchases.

    • Performed exploratory data analysis and visualized the results of the analysis using R’s visualization packages, such as ggplot2 and plotly, to present insights.

    Source Code

  • Employee Attrition Analysis
    Technology Domain : Data Analysis, Predictive Analytics
    Tech Stack : Descriptive Analytics, Predictive Modeling
    Description :

    • Conducted descriptive and predictive analytics to gain insights into the key drivers associated with employee attrition.

    • Developed models to assess the likelihood of employee turnover and explored potential interventions to mitigate attrition.

    Source Code

  • Scraping IMDb Movie Data using BeautifulSoup and Selenium
    Technology Domain : Web Scraping, Data Analysis
    Tech Stack : Python, BeautifulSoup, Selenium, Pandas, Matplotlib, Seaborn
    Description :

    • Designed and implemented a web scraping solution using Python, BeautifulSoup, and Selenium to collect and analyze movie data from IMDb, and performed data cleaning, preprocessing, and visualization using Pandas, Matplotlib, and Seaborn.

  • Analyzing Twitter Sentiment on COVID-19 using Tweepy
    Technology Domain : Natural Language Processing, Sentiment Analysis
    Tech Stack : Tweepy, NLP Techniques, Matplotlib, Tableau
    Description :

    • Utilized the Tweepy Python library to scrape tweets related to COVID-19 and conducted sentiment analysis on the collected data using natural language processing techniques such as tokenization, stemming, and sentiment analysis, and visualized the results using Matplotlib and Tableau.

    Source Code

  • A Data-Driven Approach to Email Spam Detection: Building and Evaluating Machine Learning Models
    Technology Domain : Machine Learning, Natural Language Processing
    Tech Stack : NLP Techniques, Feature Extraction, Model Selection
    Description :

    • Developed a spam detection system for a 3000-email dataset with an accuracy rate of 97.4% by utilizing natural language processing (NLP) techniques such as feature extraction (bag-of-words, TF-IDF, and n-grams), text preprocessing (stemming and stop-word removal), and model selection (Naive Bayes, Logistic Regression, and Random Forests).

    Source Code

  • Enhancing Information Extraction with Named Entity Recognition: A Machine Learning Approach
    Technology Domain : Natural Language Processing, Machine Learning
    Tech Stack : Named Entity Recognition, NLP Techniques, BERT Models, LSTM
    Description :

    • Implemented Named Entity Recognition with NLP techniques, fine-tuned pre-trained BERT models, optimized LSTM with dropout regularization and gradient clipping, and conducted error analysis and ablation studies for feature and model selection achieving 87% accuracy on CoNLL 2003 benchmark dataset.

    Source Code

  • Door With an Eye
    Technology Domain : Computer Vision, Android-Application Developement
    Tech Stack : Python3, OpenCV, Computer Vision, Flutter, Firebase
    Description :

    • A Door implemented with face unlock feature

    Source Code

  • MedEase - A Medical Record-keeping System
    Technology Domain : Web Development, Database Management
    Tech Stack : React, Django, Python, MySQL
    Description :

    • Created a record-keeping system using React, Django, Python, and MySQL to store and access patient records easily.

    • Designed and implemented features for scheduling appointments and tracking patient health metrics, improving data accuracy and streamlining record-keeping processes.

    Source Code

  • PetShop - An E-commerce System
    Technology Domain : Web Development, Database Management
    Tech Stack : Django, Python
    Description :

    • Integrated the front-end interface with the back-end functionalities using Django templates. Also, implemented product search, filtering, sorting, and pagination for an enhanced user experience.

    • Designed and implemented a collection display feature to showcase the available animals in the pet shop.

    Source Code