As Far Back As I Can Remember, I Always Wanted to Be a Data Scientist

Siddarth p -

This week, I had set out a goal to test my private set intersection (PSI) protocol with sample large language model (LLM) data. My intention had been to validate the protocol’s effectiveness in protecting sensitive information when applied to real-world AI outputs. 

However, I realized that I did not have a clear data set to actually test my protocol with. Finding appropriate sample LLM data turned out to be more of a challenge than I anticipated. I wanted something that was realistic – something that mirrored real-world use cases. This condition I wanted stalled the the integration and testing. I started researching different sources for sample LLM data. I looked for public datasets that resembled LLM outputs. Platforms like Kaggle have tons of data, but a lot of my searching in there proved fruitless. I needed specific data that mimicked actual responses in domains where privacy was important – like healthcare or finance. 

Looking back,  this search for sample LLM data was not something I initially thought would be a major task. I figured that I would just easily find it by searching the internet. However, it proved to be foundational for my testing – something I needed to test the effectiveness of PSI. So, even though I did not write any new code this week, I made an important distinction in what “realistic testing” meant – testing in scenarios where privacy matters. 

This whole experience told me that data selection is just as important as algorithm design – especially when the algorithm is only supposed to be used in highly specific scenarios. 

My goals for next week are to first finalize my sample data for the LLM and to then test my PSI protocol onto it. I’ll measure its accuracy and computational speed – the same way that I had tested my PSI protocol with sample non-LLM data a few weeks ago. 

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

Leave a Reply

Your email address will not be published. Required fields are marked *