Week 1: Coin Price Data Gathering and ARIMA

Johnny Y -

This week, I focused on obtaining the data necessary for my project and studying the ARIMA model. I found that the CoinGecko API can be used to download pricing data up to 1 year, and that CoinMarketCap can be used for manual entry beyond 1 year. I had to learn some Python in order to create the code for importing data from the API. I also settled on Dogecoin, Shiba Inu, and Pepe as my coins of study, and learned about the ARIMA forecasting model. Next week, I will finish importing data (with manual entry) and build and test the ARIMA model, and I also hope to identify candidates for the classifier from Hugging Face.

I feel that this project is progressing on schedule, though I anticipate that I may find issues during testing. That being said, I did run into some obstacles. Though I originally planned to use 3 years of data, only Dogecoin meets this requirement. After discussing with my site advisor, we decided that 1.5 years of data might be sufficient. I examined several cryptocurrency sentiment analysis papers to confirm this, including Huang et al (2021), Khan and Ihsan (2024), Raheman et al (2022), Abraham et al (2018), and Colianni et al (2015). Refreshing my Python knowledge and debugging code was a major time drain (turns out pip is run from Command Prompt, not the Python shell). Lastly, cryptocurrency pricing data is behind a paywall beyond 1 year on CoinGecko, and I was unable to find a free solution (Yahoo Finance also requires a premium subscription and CoinMarketCap doesn’t allow downloads with granularity beyond a week).

Some notes on the papers I read:

  • Tweet volume may be a strong benchmark to compare sentiment analysis to
  • Choice of model plays a significant role in results
  • Comparing results from overall sentiment analysis with simply analyzing posts from top “expert” accounts may be interesting
  • Sentiment analysis may be less effective when prices are falling due to bot posts attempting to keep the price up

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    christinemerrill
    I am so curious where on earth these names come from? Shiba Inu--very cool dog breed. Hugging Face?
    johnny_y
    Memes! Dogecoin is named after the "doge" meme that was popular at the time of its creation; Shiba Inu is the breed of the dog in the meme. Hugging Face comes from the "hugging face" emoji, which serves as the company's logo.
    rob_lee
    The timeframe is interesting considering a crypto cycle is on the order of 4 years. It would be great to extend the timeframe out further. Too bad on the API paywall.
    johnny_y
    Right. It might be interesting to test the final model on 4 years of (manually entered) Dogecoin data if I have the time.
    rob_lee
    I was also thinking about this more and wonder if there is a stronger/more valid signal during bear markets.
    aditya_s
    Given that the model used has a significant impact on the results, what led you to choose the ARIMA model? Were there other models that you considered and ultimately rejected?
    tesla_l
    Are you going to look into pepe coin at all or any other meme coins? Or are you just going to be looking at Dogecoin and Hugging Face?
    arjun_c
    Hugging Face is a site that hosts a bunch of models for developers to use (among other utilities), not a coin. Though, I'm sure there's a hugging face coin out there.
    austin_l
    Do you think the paywall to the pricing data will significantly impact your results? Do you think your project might shift to accommodate some of these barriers?
    arjun_c
    Johnny, could you provide a link to the Khan and Ihsan (2024) paper, please? I can't find it by searching. Thanks
    johnny_y
    Abraham et al (2018) found that the signal is actually less valid during bear markets due to people trying to prop up the price.
    johnny_y
    My mentor advised me to utilize ARIMA as a benchmark, and standard practice from the papers I read was to use ARIMA as a benchmark.
    johnny_y
    As Arjun mentioned, Hugging Face is a site with models for developers, not a coin. I will also be investigating Shiba Inu and Pepe.
    johnny_y
    I discussed the paywall issue with my advisor, and we decided that 1.5 years of data was enough based on other papers. I'll just have to manually enter the data past 1 year.
    johnny_y
    Khan and Ihsan (2024): https://jcbi.org/index.php/Main/article/view/744
    Anonymous
    Hi Johnny, what made you choose those three coins in particular for your study?
    johnny_y
    I chose these coins because they are older, well-established meme coins with significant market caps. This helps ensure that there is enough pricing and social media data for the model to be trained on.

Leave a Reply

Your email address will not be published. Required fields are marked *