Welcome to my portfolio 👋

I'm Indrajeet Aditya Roy, a full-stack software engineer and graduate student keen on building innovative scalable solutions, solving complex problems, gaining professional experience, and collaborating with people. I earned my BS in Software Engineering with a concentration in Data Science from Iowa State University in 2022.

I'm currently pursuing a MS in Computer Engineering, specializing in Artificial Intelligence and Machine Learning, at Northeastern University. My academic and professional journeys are driven by a passion for applying problem solving in real-world applications and continuously evolving and improving my skillset.

2020

Carlin Fit - Software Engineer Intern

Developed Android companion app to interface with the UV light hardware device component of the Purple-Bug UV ride-share cleaning product, utilizing Java/Kotlin, Retrofit, Glide, Room, RxJava, and Coroutines.

2021

The RealReal - Software Engineer Intern

Developed RESTful Ruby Rails and GraphQL Elixir API services enabling efficient and scalable CRUD operations; utilized AWS RDS for data store, Redis caching for data access optimization, and Kafka message broker for distributed inter-service communication, implementing the ‘Get Paid Now’ user program.

2022

The RealReal - Software Engineer

Developed internal employee and developer tooling in addition to user-facing product features across backend APIs, data pipelines, and frontend applications, utilizing Elixir, Python, Ruby on Rails, JavaScript, TypeScript, React, SQL, GraphQL, Datadog, Bugsnag, Terraform and AWS services (DynamoDB, RDS, S3, Lambda, Kinesis, EC2, MSK, CloudWatch).

2024

Amwell - Software Engineer Intern

Integrated Validic with the Amwell Conversa Health platform enabling EHR data synchronization from health monitoring apps and devices with user automated healthcare conversations, improving user engagement and care with data driven personalization and analysis.

Research

Mitigating Deanonymization and DoS Attacks in Decentralized Networks View Paper

Research Paper analyzing attack vectors and Tor network architecture vulnerabilities focusing on deanonymization and Denial-of-Service (DoS) attacks in decentralized networks.

Enhancing Research Paper Summarization View Paper

Research paper examining enhancement of LLM based academic journal summarization precision by integrating language model fine-tuning methodologies such as Parameter-Efficient Fine-Tuning (PEFT) and Retrieval-Augmented Generation (RAG).

Vision-Language Integration in LLMs: A Survey of Architectures, Training Paradigms, and Applications View Paper

Survey paper examining the integration of vision capabilities into Large Language Models (LLMs), focusing on the architectural evolution, training methodologies, and applications of Vision-Language Models (VLMs). The paper systematically analyzes the progression from early two-stream architectures to modern unified frameworks, covering key developments in model design and cross-modal learning techniques. It provides detailed coverage of three main areas: architectural choices that enable effective vision-language integration, diverse training paradigms, and emerging evaluation frameworks.

Projects

NLP-Enhanced Sumerian-English Translation: Classification and NER Techniques

Application of Natural Language Processing (NLP) techniques to improve the translation accuracy of ancient and low-resource languages, specifically focusing on two primary technical challenges: the development of classification-translation models that facilitate transliterations between Sumerian and English, and a Named Entity Recognition (NER) model designed to bridge lexical gaps in English originating from the unique properties of Sumerian-specific proper nouns. The project aims to enhance translational accuracy and enrich the semantic understanding of translated texts.

View Report

Natural Language Sentiment Analysis: Custom Vectorization and Tokenization MethodologiesGithub Repository

Implementation of custom vectorization and tokenization techniques for natural language data, with a performance comparison to the baseline sklearn TfidfVectorizer in a sentiment analysis task using IMDb movie reviews. The primary objective is to create a custom TF-IDF vectorization technique that offers more control over how the text data is processed and represented, including tokenization, n-gram generation, and feature selection. This custom approach is benchmarked against the standard sklearn TF-IDF implementation to assess its effectiveness in capturing key features and improving classification performance.

XML Data Warehouse for Data Forecasting and Analysis Github Repository

Implementation of a data warehouse for pharmaceutical sales which integrates sales representative details and transactional data. Utilizing ETL pipelines, the system efficiently extracts data from complex XML data structures and transforms it into a format suitable for MySQL DB integration.

The architecture of the data warehouse is designed to support OLTP (Online Transaction Processing) for high transaction rates and immediate query handling, while also being optimized for OLAP (Online Analytical Processing) to facilitate complex analytical queries. This dual optimization ensures that the data warehouse can handle both day-to-day operations and provide deep analytical insights.

Key project activities include designing a scalable database schema that accommodates growing data volumes and diverse datasets, and implementing data transformation processes that ensure data integrity and compatibility with enterprise analytics tools. The final component of the project focuses on data analysis and visualization, employing data analytics techniques to interpret the data, identify trends, and support decision-making processes in pharmaceutical sales strategies.

Unified File System Query Tool with Hierarchical DOM Analysis Github Repository

Development of a query tool to navigate Unix and Windows file systems, implementing a hierarchical document object model (DOM) for data store queries. It provides comprehensive file navigation and structured data querying capabilities across multiple operating systems, streamlining management of complex data structures. By interacting with file system APIs, the tool enables detailed search, file retrieval based on specific attributes, and query execution within hierarchical data structures. Its functionality supports a range of applications from data analysis to system management

News Article Classifier with Custom K-Means Clustering Implementation Github Repository

Implementation and evaluation of the K-Means clustering algorithm for analyzing text data within the 20 Newsgroups dataset. By utilizing a Term Frequency-Inverse Document Frequency (TF-IDF) vectorization approach, the implementation aims to categorize text documents into clusters, enhancing the understanding of text content similarities. The custom algorithm implementation is benchmarked against the standard Scikit-learn KMeans to assess performance metrics such as Homogeneity, Completeness, V-measure, Adjusted Rand Index, and Silhouette Coefficient. These metrics are crucial for evaluating the custom model's effectiveness in handling complex textual data, with a focus on optimizing clustering quality and execution efficiency.

A Practical Implementation of Energy and Performance-Aware Task Scheduling in Mobile Cloud Computing

An C++ implementation of the theoretical framework presented in the Energy and Performance-Aware Task Scheduling in a Mobile Cloud Computing Environment paper.The implementation constructs a Directed Acyclic Graph (DAG) of tasks and assigns local and cloud execution times.The implementation performs an initial phase of task assignment and prioritization to determine an initial schedule that minimizes total completion time, adhering to the paper’s outlined equations and heuristics for computing tasks’ priority scores and selecting their initial execution units. Once the baseline schedule is established, the code systematically explores migrating tasks between local cores and the cloud to achieve better energy efficiency. Each candidate migration scenario triggers a linear-time kernel rescheduling algorithm, recomputing all task start and finish times while ensuring all precedence constraints and resource capacities remain satisfied. After evaluating the new completion time and total energy consumption resulting from each migration, the code retains changes that reduce energy without exceeding permissible completion time thresholds. This iterative optimization loop directly mirrors the paper’s approach to jointly optimizing both performance (makespan) and energy consumption in a mobile cloud computing environment.

Github Repository

Reinforcement Learning in Complex Route Navigation and Spatial Decision Making

This simulation explores the implementation of Q-Learning, SARSA, and Actor-Critic algorithms in a complex stochastic grid-based maze environment, featuring dynamic challenges like walls, bumps, and oil slicks. The maze consists of 248 navigable cells. Each algorithm is evaluated over 10 independent runs, each encompassing 1,000 episodes, aiming to optimize paths and maximize rewards within a defined penalty and reward structure. This testing framework assesses the efficacy of each algorithm in achieving the highest cumulative rewards and successfully navigating to the goal optimally.

View Report

Reinforcement Learning in Cancer Research: Optimizing the p53-Mdm2 Feedback Loop View Report

The p53-Mdm2 negative feedback loop is crucial for cell cycle regulation and tumor suppression, playing a significant role in advancing cancer therapy and developing new treatment strategies. This project aims to employ reinforcement learning to finely adjust the activation of this network's key components, specifically ATM, p53, Wip1, and MDM2. The primary objective is to establish an optimal control policy that effectively manages the p53-Mdm2 feedback loop, maximizing the therapeutic efficacy of maintaining an active p53 pathway. By achieving and sustaining this optimal state, the implementation offers a novel approach to controlling complex cellular networks, potentially inhibiting tumor growth and enhancing cellular repair mechanisms.

Neural Network Optimization for Minimized Error Probabilistic Classification View Report Github Repository

Development and optimization of Multilayer Perceptrons (MLPs) for classifying a 3-dimensional real-valued random vector into one of four classes with uniform priors. The main objective is to effectively approximate class label posteriors and to implement a Maximum A Posteriori (MAP) classification rule using the trained MLPs, achieving the lowest possible probability of error.

The MLPs are trained using maximum likelihood parameter estimation, minimizing the average cross-entropy loss and enabling precise approximation of the class label posteriors. Post-training, the MLPs are utilized to implement a MAP classification rule aimed at minimizing the expected loss, using a binary loss function for correct and incorrect classification.

The optimization process includes one hidden layer of perceptrons, with the optimal number of neurons determined via K-fold cross-validation and GridSearchCV. These methods are used to fine-tune the number of perceptrons and regularization, ensuring the model is appropriately scaled to the dataset size and complexity.

API Triggered ETL Pipeline for CSV Processing Github Repository

Implementation of an ETL pipeline to efficiently process and transform CSV files into actionable data points and features. Responsive to API calls, the system methodically extracts data from the CSV files, applies necessary transformations, and uploads the processed data to a relational database. This pipeline facilitates dynamic data management, supports data analysis and also provides scope for extended API integration.

Vehicle Localization Optimization in 2D space via Range Measurements and MAP Estimation

Implementation of Maximum A Posteriori (MAP) estimation to enhance vehicle localization accuracy in 2D environments. The Bayesian model integrates range measurements to multiple landmarks with existing positional data, allowing for a highly accurate estimation of the vehicle's most probable location. By utilizing MAP estimation, the model effectively addresses noisy distance measurements and prior knowledge of the vehicle's position, ensuring precise navigation despite environmental complexities and measurement inaccuracies.

Github Repository

Snake DEMO (TRY IT OUT!)

Implementation of a game engine for the classic Snake game developed using VueJS and complemented by vanilla HTML and CSS.


Vue.js streamlines state management and facilitates real-time rendering of the game environment, ensuring dynamic updates are efficiently handled without reloading the browser. Utilized to implement dynamic management of the game state, including the snake's movement, collision detection, and score tracking. HTML is used to structure the core layout of the game, setting up a grid-based playing field where the snake moves. CSS is utilized to style the game board and snake elements implemeting color schemes, borders, and animations.

Rock Paper Scissors DEMO (TRY IT OUT!)

Implementation of a game engine for the classic Rock Paper Scissors game developed using Vanilla JavaScript, HTML and CSS.


The game engine implements the Minimax algorithm to simulate intelligent opponent moves and strategize optimal responses using JavaScript for the game logic, HTML for structuring the game UI, and CSS for response design, styling and animations. The Minimax algorithm extends the game's interactivity by enabling the system opponent to make decisions that are strategically sound, mimicking human-like decision-making processes adding a challenge and realism to the game engine.

Tic-tac-toe DEMO (TRY IT OUT!)

Implementation of a game engine for the classic Tic-tac-toe game developed using Vanilla JavaScript, HTML and CSS.

The game engine implements the Minimax algorithm to simulate intelligent opponent moves and strategize optimal responses using JavaScript for the game logic, HTML for structuring the game UI, and CSS for response design, styling and animations. The Minimax algorithm extends the game's interactivity by enabling the system opponent to make decisions that are strategically sound, mimicking human-like decision-making processes adding a challenge and realism to the game engine.

Sound Processor and Song Note Compiler Github Repository

Sound processing and song note compilation platform for real-time audio signal processing and synthesis developed using C and C++. It enables the generation of various waveforms—including Sine, Square, Triangle, and Sawtooth—through user-adjustable parameters such as sample rate, frequency, and duration.


The platform supports complex audio signal layering, allowing users to overlay multiple audio signals with precision. This includes detailed amplitude control and modulation, where the amplitude of one signal can dynamically influence others. Advanced digital filters, such as a digital reverb filter, create short-lived echoes to simulate different acoustic environments effectively.


An integral component of the system is its song player functionality, implemented using C to optimize performance. It decodes various audio file formats, arranging the decoded samples into a seamless playback sequence that can be further manipulated for sound design or educational purposes. This functionality highlights the system’s capability to handle complex audio processing tasks efficiently.

Purple Bug Mobile Companion App

Summer 2020 Internship Project

Android companion app to interface with the UV light hardware device component of the Purple-Bug UV ride-share cleaning product.

The Android mobile app was developed using Java and Kotlin, providing a stable and responsive UI. Network communications were implemented using Retrofit, which handled data transmission between the mobile app and backend services effectively. Image loading and caching were optimized using Glide to enhance performance and user experience. Local data storage was implemented using Room, which offers an abstraction layer over SQLite, simplifying database operations. Complex asynchronous tasks were managed using RxJava and Kotlin Coroutines, ensuring smooth operation of the app by executing background tasks without interfering with the user interface.