Stock Market Prediction with Future News

  • Extracted text from 10 years of NYT front-page PDFs using PyTesseract OCR and generated embeddings via Nomic/Ollama.
  • Engineered features with RoBERTa to quantify article agreement with stock market–related statements.
  • Merged news embeddings with historical AlphaVantage stock data to model prior-day market movement using TensorFlow/Keras dense-dropout networks.
  • Evaluated predictive performance across multiple datasets (full pages, individual articles, similarity metrics)