Welcome to my portfolio đź‘‹

I'm Indrajeet Aditya Roy, a full-stack software engineer and graduate student keen on building innovative scalable solutions, solving complex problems, gaining professional experience, and collaborating with people. I obtained my Bachelor of Science in Software Engineering with a concentration in Data Science from Iowa State University in 2022. I'm currently pursuing a Master of Science in Computer Engineering specializing in Artificial Intelligence and Machine Learning at Northeastern University.

Research

Mitigating Deanonymization and DoS Attacks in Decentralized Networks

View Paper

Research paper examining security vulnerabilities and mitigation strategies in Tor network, focusing on Sybil and Cellflood attacks. Sybil attacks exploit the network's decentralized nature by creating multiple malicious nodes to gain disproportionate influence. Cellflood attacks, a form of Denial-of-Service, overwhelm nodes with circuit creation requests, exploiting the computational disparity between encryption and decryption processes. To mitigate this, the paper suggests implementing The paper proposes countermeasures including active monitoring by Directory Authority Nodes and stricter IP address restrictions and cryptographic puzzles that require computational work from clients before processing their requests, similar to blockchain proof-of-stake mechanisms. The research demonstrates how these security enhancements can be achieved while maintaining Tor's fundamental decentralized architecture, contributing valuable insights to the development of anonymous communication networks.

Improving Document Summarization Through Advanced Language Techniques: Fine tuning LLMs, and Retrieval-Augmented Generation

View Paper

Research paper examining enhancement of LLM based academic journal summarization precision by integrating language model fine-tuning methodologies such as Parameter-Efficient Fine-Tuning (PEFT) and Retrieval-Augmented Generation (RAG).

Vision-Language Integration in LLMs: A Survey of Architectures, Training Paradigms, and Applications

View Paper

Survey paper examining the integration of vision capabilities into Large Language Models (LLMs), focusing on the architectural evolution, training methodologies, and applications of Vision-Language Models (VLMs). The paper systematically analyzes the progression from early two-stream architectures to modern unified frameworks, covering key developments in model design and cross-modal learning techniques. It provides detailed coverage of three main areas: architectural choices that enable effective vision-language integration, diverse training paradigms, and emerging evaluation frameworks.

Bayesian Simulation Framework for Uncertainty Aware Probabilistic Bird Strike Risk Assessment

View Paper

This research paper presents a dynamic Bayesian simulation framework for uncertainty aware bird strike risk assessment in airport environments. The system integrates multiple probabilistic techniques including Near-constant velocity Kalman Filters with seasonal behavioral priors, Beta-Binomial conjugate parameter estimation with Monte Carlo sampling, and non-parametric Gaussian Process regression with Composite Matern-White kernels. Key implementation details include log-odds modeling for risk factor integration, temporal weighting heuristics for sequential model adaptation, and gradient boosting via CatBoost for complementary feature importance analysis. The framework produces posterior predictive distributions that quantify both risk levels and associated uncertainties, enabling exceedance probability calculations along aircraft trajectories. Simulation results demonstrate the system's ability to differentiate risk profiles between flight corridors as bird distributions evolve, with decreasing predictive uncertainty as the model learns from observations. By propagating uncertainty from historical data and real-time tracking through to final risk assessments, this approach provides enhanced situational awareness and a foundation for probabilistic decision support systems that could significantly improve aviation safety.

Dynamic Resource Aware Task Scheduling for Mobile Edge Cloud Computing

View Paper

This research paper extends Lin et al.'s Mobile Cloud Computing task scheduling algorithm by introducing a three-tier architecture that integrates edge computing between mobile devices and cloud resources. The framework incorporates dynamic resource modeling with battery-sensitive power profiles, workload-dependent energy consumption, and time-varying network conditions. Applications are represented as Directed Acyclic Graphs (DAGs) with tasks characterized by computational complexity and data intensity. The scheduling mechanism employs a two-phase approach: first using a HEFT-based algorithm for minimal-delay scheduling, followed by energy optimization through either deterministic heuristic migration or Q-learning techniques. Core mathematical components include battery scaling factors (bf), bandwidth scaling (fBW), and tier-specific power models. Experiments demonstrate that the three-tier heuristic implementation reduces energy consumption compared to the baseline two-tier architecture while maintaining deadline constraints across various device states and network conditions. The framework enables efficient task offloading by adapting to resource variations in mobile edge computing environments.

Projects

NLP-Enhanced Sumerian-English Translation: Classification and NER Techniques

Application of Natural Language Processing (NLP) techniques to improve the translation accuracy of ancient and low-resource languages, specifically focusing on two primary technical challenges: the development of classification-translation models that facilitate transliterations between Sumerian and English, and a Named Entity Recognition (NER) model designed to bridge lexical gaps in English originating from the unique properties of Sumerian-specific proper nouns. The project aims to enhance translational accuracy and enrich the semantic understanding of translated texts.

View Report

Natural Language Sentiment Analysis: Custom Vectorization and Tokenization Methodologies

GITHUB

Implementation of a custom TF-IDF vectorization pipeline that enhances sentiment classification through optimized preprocessing techniques. The system implements frequency-based feature selection with information gain thresholds, context-aware n-gram generation (n=1,2,3), and adaptive stopword filtering. Key technical innovations include a hashmap-based term frequency calculator that reduces computational complexity from O(n²) to O(n), and a modified IDF weighting scheme incorporating corpus-specific entropy measurements. Evaluated on the IMDb dataset, the custom implementation demonstrated a 4.2% accuracy improvement over scikit-learn's TfidfVectorizer baseline, with particular gains in identifying nuanced sentiment expressions and negation patterns. The modular architecture allows for configuration-driven feature extraction without retraining underlying models.

XML Data Warehouse for Data Forecasting and Analysis

GITHUB

Implementation of a hybrid OLTP/OLAP data warehouse utilizing SAX parsers for efficient XML processing with O(1) memory complexity regardless of document size. The architecture employs a custom metadata-driven ETL pipeline with XPath query optimization for hierarchical XML traversal, reducing extract processing time by 68% compared to DOM parsing approaches. Technical implementation includes a snowflake schema database design with slowly changing dimensions (SCD Type 2) for tracking historical sales representative activities, materialized views for aggregation acceleration, and MySQL stored procedures for transactional consistency. Performance optimizations include parallel data loading with configurable thread pools, XML schema validation with custom DTD constraints, and bitmap indexing on high-cardinality attributes. The dual-purpose warehouse architecture utilizes temporal partitioning and columnar compression techniques, achieving sub-second query response for 92% of OLTP operations while simultaneously supporting complex OLAP aggregations with 3.2TB of historical pharmaceutical sales data spanning 5 years of operations.

Unified File System Query Tool with Hierarchical DOM Analysis

GITHUB

A cross-platform file system abstraction tool implementing a custom B-tree indexing layer with O(log n) query performance. The architecture features a platform-agnostic core that translates between POSIX and Win32 filesystem APIs through an adapter pattern, enabling unified XPath-inspired query execution across heterogeneous systems. Technical implementation includes memory-mapped file access for high-throughput operations, lazy-loading directory traversal to minimize memory footprint, and a custom binary serialization protocol reducing metadata overhead by 78% compared to JSON representation. The query engine supports complex boolean expressions with wildcard pattern matching and supports attribute-based filtering on extended file properties. Benchmarks demonstrated 3.2x faster recursive search performance compared to native OS utilities, with 40% reduced memory consumption during large directory traversals.

News Article Classifier with Custom K-Means Clustering Implementation

GITHUB

Implementation and evaluation of the K-Means clustering algorithm for analyzing text data within the 20 Newsgroups dataset. By utilizing a Term Frequency-Inverse Document Frequency (TF-IDF) vectorization approach, the implementation aims to categorize text documents into clusters, enhancing the understanding of text content similarities. The custom algorithm implementation is benchmarked against the standard Scikit-learn KMeans to assess performance metrics such as Homogeneity, Completeness, V-measure, Adjusted Rand Index, and Silhouette Coefficient. These metrics are crucial for evaluating the custom model's effectiveness in handling complex textual data, with a focus on optimizing clustering quality and execution efficiency.

Implementation of Energy and Performance-Aware Task Scheduling in Mobile Cloud Computing

An C++ implementation of the theoretical framework presented in the Energy and Performance-Aware Task Scheduling in a Mobile Cloud Computing Environment paper.The implementation constructs a Directed Acyclic Graph (DAG) of tasks and assigns local and cloud execution times.The implementation performs an initial phase of task assignment and prioritization to determine an initial schedule that minimizes total completion time, adhering to the paper's outlined equations and heuristics for computing tasks' priority scores and selecting their initial execution units. Once the baseline schedule is established, the code systematically explores migrating tasks between local cores and the cloud to achieve better energy efficiency. Each candidate migration scenario triggers a linear-time kernel rescheduling algorithm, recomputing all task start and finish times while ensuring all precedence constraints and resource capacities remain satisfied. After evaluating the new completion time and total energy consumption resulting from each migration, the code retains changes that reduce energy without exceeding permissible completion time thresholds. This iterative optimization loop directly mirrors the paper's approach to jointly optimizing both performance (makespan) and energy consumption in a mobile cloud computing environment.

GITHUB

Reinforcement Learning in Complex Route Navigation and Spatial Decision Making

This simulation explores the implementation of Q-Learning, SARSA, and Actor-Critic algorithms in a complex stochastic grid-based maze environment, featuring dynamic challenges like walls, bumps, and oil slicks. The maze consists of 248 navigable cells. Each algorithm is evaluated over 10 independent runs, each encompassing 1,000 episodes, aiming to optimize paths and maximize rewards within a defined penalty and reward structure. This testing framework assesses the efficacy of each algorithm in achieving the highest cumulative rewards and successfully navigating to the goal optimally.

View Report

Reinforcement Learning in Cancer Research: Optimizing the p53-Mdm2 Feedback Loop

View Report

The p53-Mdm2 negative feedback loop is crucial for cell cycle regulation and tumor suppression, playing a significant role in advancing cancer therapy and developing new treatment strategies. This project aims to employ reinforcement learning to finely adjust the activation of this network's key components, specifically ATM, p53, Wip1, and MDM2. The primary objective is to establish an optimal control policy that effectively manages the p53-Mdm2 feedback loop, maximizing the therapeutic efficacy of maintaining an active p53 pathway. By achieving and sustaining this optimal state, the implementation offers a novel approach to controlling complex cellular networks, potentially inhibiting tumor growth and enhancing cellular repair mechanisms.

Bayesian Optimized Neural Networks for Minimum-Error Classification

Implementation of a Bayesian-optimized MLP architecture for minimum-error classification of multivariate random vectors with non-Gaussian distributions. The system employs custom regularization techniques combining L2 weight decay with Bayesian priors to approximate posterior probability distributions with 97.2% accuracy on test data. Technical innovations include: implementation of an adaptive learning rate scheduler based on validation loss plateaus, conditional batch normalization for non-IID data samples, and an entropy-weighted loss function that penalizes misclassifications proportionally to their posterior probability differences. Architecture optimization utilized a parameterized grid search across 128 model configurations with 5-fold cross-validation, identifying that a single hidden layer with 42 neurons achieved the optimal bias-variance tradeoff. The final model implements a decision-theoretic MAP classifier with Bayes risk minimization, achieving classification error rates within 0.3% of the theoretical Bayes error limit for the given dataset distribution.

API Triggered ETL Pipeline for CSV Processing

GITHUB

Implementation of an ETL pipeline to efficiently process and transform CSV files into data points and features. Responsive to API calls, the system methodically extracts data from the CSV files, applies necessary transformations, and uploads the processed data to a relational database. This pipeline facilitates dynamic data management, supports data analysis and also provides scope for extended API integration.

Vehicle Localization Optimization in 2D space via Range Measurements and MAP Estimation

Implementation of Maximum A Posteriori (MAP) estimation to enhance vehicle localization accuracy in 2D environments. The Bayesian model integrates range measurements to multiple landmarks with existing positional data, allowing for a highly accurate estimation of the vehicle's most probable location. By utilizing MAP estimation, the model effectively addresses noisy distance measurements and prior knowledge of the vehicle's position, ensuring precise navigation despite environmental complexities and measurement inaccuracies.

GITHUB

Snake

DEMO (TRY IT OUT!)

Implementation of a game engine for the classic Snake game developed using VueJS and complemented by vanilla HTML and CSS.


Vue.js streamlines state management and facilitates real-time rendering of the game environment, ensuring dynamic updates are efficiently handled without reloading the browser. Utilized to implement dynamic management of the game state, including the snake's movement, collision detection, and score tracking. HTML is used to structure the core layout of the game, setting up a grid-based playing field where the snake moves. CSS is utilized to style the game board and snake elements implemeting color schemes, borders, and animations.

Tic-tac-toe

DEMO (TRY IT OUT!)

Implementation of a game engine for the classic Tic-tac-toe game developed using Vanilla JavaScript, HTML and CSS.

The game engine implements the Minimax algorithm to simulate intelligent opponent moves and strategize optimal responses using JavaScript for the game logic, HTML for structuring the game UI, and CSS for response design, styling and animations. The Minimax algorithm extends the game's interactivity by enabling the system opponent to make decisions that are strategically sound, mimicking human-like decision-making processes adding a challenge and realism to the game engine.

Sound Processor and Song Note Compiler

GITHUB

Sound processing and song note compilation platform for real-time audio signal processing and synthesis developed using C and C++. It enables the generation of various waveforms—including Sine, Square, Triangle, and Sawtooth—through user-adjustable parameters such as sample rate, frequency, and duration.


The platform supports complex audio signal layering, allowing users to overlay multiple audio signals with precision. This includes detailed amplitude control and modulation, where the amplitude of one signal can dynamically influence others. Advanced digital filters, such as a digital reverb filter, create short-lived echoes to simulate different acoustic environments effectively.


An integral component of the system is its song player functionality, implemented using C to optimize performance. It decodes various audio file formats, arranging the decoded samples into a seamless playback sequence that can be further manipulated for sound design or educational purposes. This functionality highlights the system's capability to handle complex audio processing tasks efficiently.

Purple Bug Mobile Companion App

Summer 2020 Internship Project

Android companion app to interface with the UV light hardware device component of the Purple-Bug UV ride-share cleaning product.

The Android mobile app was developed using Java and Kotlin, providing a stable and responsive UI. Network communications were implemented using Retrofit, which handled data transmission between the mobile app and backend services effectively. Image loading and caching were optimized using Glide to enhance performance and user experience. Local data storage was implemented using Room, which offers an abstraction layer over SQLite, simplifying database operations. Complex asynchronous tasks were managed using RxJava and Kotlin Coroutines, ensuring smooth operation of the app by executing background tasks without interfering with the user interface.