I am a Data Scientist at heart and a software developer by
practice.
Throughout my career, I have garnered 3+ years of experience in software development and 1 year in Machine
Learning.
Currently, I am working as a Research Assistant at NLP Labs, where my responsibilities include developing
an application(using Flask, postgreSQL)
to create a state-of-the-art database. Additionally, I am collaborating and learning from researchers at
the Development and Cognitive Sciences Lab
to devise statistical analysis methodologies for multilevel and multimodal analysis(in Python and R). I am
also involved in creating Machine Learning Pipelines
using cutting-edge transformer models for developing classification models(using Pytorch).
In my free time, I enjoy playing chess, solving Rubik's Cube puzzles, and practicing origami.
I am also cultivating a habit of reading self-development books. Currently, I am reading "Think and Grow
Rich" by Napoleon Hill.
I firmly believe that the fusion of Data Analysis, Machine Learning, and Software Development, coupled
with my people skills,
would create a positive and impactful environment at my work place.
Languages : | Python |
---|---|
Database: | SQL, NoSQL, MongoDB |
Frameworks: | Flask, Django, PyTorch, TensorFlow, spaCy, NLTK |
Web Frameworks: | Node.js, React |
ML Algorithms: | Regression, Classification, Clustering, Neural Networks, Decision Trees, Random Forests |
Statistical Analysis: | Descriptive Statistics, Hypothesis Testing, A/B Testing, Probability Theory |
Data Visualization: | Matplotlib, Seaborn, Tableau, PowerBI |
Platforms: | Linux, AWS |
Miscellaneous: | Git, Docker, Kubernetes, CI/CD, Hadoop, Spark |
• Working under Professor Alberto Ortega as part of the Faculty Assistance in Data Science (FADS) Program by Luddy School of Informatics.
• Extracted and processed 5 years' worth of data from the National Directory of Mental Health Treatment Facilities' PDF files into a valuable source of contact information by utilizing Python, Regular Expressions, and web scraping techniques.
• Working under the guidance of Professor Minkyung Koo to review and summarize 40+ research papers, focusing on coupon use, discount, and deal proneness, to identify dependent variables, independent variables, mediators, moderators, and main observations.
• Developing a Flask application for the Ellipsis and Elided Elements in Natural Language: The Hoosier Ellipsis Corpus (https://nlp-lab.org/ellipsis/) to create a state-of-the-art database under the guidance of Professor Damir Caver.
• Contributing to devising statistical analysis methodologies for multilevel and multimodal analysis of physiological signals data and designing ML pipelines to develop a classification model under the guidance of Professor Bennett Bertenthal.
• Worked on a data warehouse application, collecting, transforming, and cleaning raw data from different networking infrastructure systems.
• Utilized Microsoft Azure cloud, Linux servers, and Oracle Database in a Networking Decision and Support Database team.
• Played an instrumental role in creating, converting, and redesigning PL/SQL packages, Perl scripts, and Data Mappings for 13 projects.
• Optimized job run time by 60% through modification of extract, load, and transfer job scripts.
• Suggested and initiated the automation of 3 PVT and 2 monitoring activities as part of the Operational Excellence, which contributed to a 40% boost in the team's ticket resolution rate.
• Developed strong communication and collaboration skills through effective client and team interactions.
Computational Sciences (Data Science)
• Luddy Outstanding Service Award
• Vice-President and Director of Public Relations, Data Science Club at Indiana University
• Secretory, IEEE Indiana University Student Branch
Statistics, Algorithms, Exploratory Data Analysis, Machine Learning, Deep Learning, Computer Vision, Cloud Computing, Natural Language Processing.
Computer Science and Technology
• Co-Founder, Code-Space (Programming Club)
• Head, Student's Training and Placement Committee (2019-2020)
• Member, Student's Council of DOT (2017-2020)
• Anchor, cultural show 'Symphony' (editions 2K18, 2K19, 2K20)
• Member, 'Harit Sena Dal' (2017-2019)
• Player, DOT-CST's Kabaddi and Kho-Kho Team
Computational Mathematics, Data Structures and Algorithms, Operating Systems, Data Communication, Networking Systems, System Programming, Computer Security.
• Leveraged data analysis and visualization tools to investigate trends, patterns, and correlations in the historical sales data, holiday events, and store information data.
• Utilized random forest, XGBoost, and ensemble algorithms to predict weekly sales for 45 Walmart stores.
• Conducted customer segmentation project for a grocery store using K-means clustering to segment customers based on their purchase behavior, such as the frequency and amount of their purchases.
• Performed exploratory data analysis and visualized the results of the analysis using R’s visualization packages, such as ggplot2 and plotly, to present insights.
• Conducted descriptive and predictive analytics to gain insights into the key drivers associated with employee attrition.
• Developed models to assess the likelihood of employee turnover and explored potential interventions to mitigate attrition.
• Designed and implemented a web scraping solution using Python, BeautifulSoup, and Selenium to collect and analyze movie data from IMDb, and performed data cleaning, preprocessing, and visualization using Pandas, Matplotlib, and Seaborn.
• Utilized the Tweepy Python library to scrape tweets related to COVID-19 and conducted sentiment analysis on the collected data using natural language processing techniques such as tokenization, stemming, and sentiment analysis, and visualized the results using Matplotlib and Tableau.
• Developed a spam detection system for a 3000-email dataset with an accuracy rate of 97.4% by utilizing natural language processing (NLP) techniques such as feature extraction (bag-of-words, TF-IDF, and n-grams), text preprocessing (stemming and stop-word removal), and model selection (Naive Bayes, Logistic Regression, and Random Forests).
• Implemented Named Entity Recognition with NLP techniques, fine-tuned pre-trained BERT models, optimized LSTM with dropout regularization and gradient clipping, and conducted error analysis and ablation studies for feature and model selection achieving 87% accuracy on CoNLL 2003 benchmark dataset.
• A Door implemented with face unlock feature
• Created a record-keeping system using React, Django, Python, and MySQL to store and access patient records easily.
• Designed and implemented features for scheduling appointments and tracking patient health metrics, improving data accuracy and streamlining record-keeping processes.
• Integrated the front-end interface with the back-end functionalities using Django templates. Also, implemented product search, filtering, sorting, and pagination for an enhanced user experience.
• Designed and implemented a collection display feature to showcase the available animals in the pet shop.