We provide evidence of the usefulness of exploiting online text data in stock prediction systems. We do this by mining a popular Argentinian stock message board and empirically answering two questions. First, is there information in the online stock message board useful for predicting stock returns? Second, if useful information is found, is it novel or it is simply a different way of expressing information already available in the past behavior of stock prices? To address these questions, we build and validate a series of predictive models using state-of-the-art machine learning and topic discovery techniques. Running experiments in which the models are trained with different combinations of features extracted from the past behavior of stock prices, or mined from the online message boards. Evidence suggests that it is possible to extract predictive information from stock message boards. Furthermore, we find that adding this information improves the performance of classification systems trained solely on technical indicators. Our results suggest that information from online text data is complementary to the one available in the past evolution of stock prices. Additionally, we find that highly predictive features derived from the message board data seem to have an important and relevant semantic content. © 2017 Elsevier B.V.


Documento: Artículo
Título:Assessing the usefulness of online message board mining in automatic stock prediction systems
Autor:Gálvez, R.H.; Gravano, A.
Filiación:Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
Palabras clave:Latent semantic analysis; Random forest; Ridge regression; Stock market; Text mining; Classification (of information); Costs; Data mining; Decision trees; Electronic trading; Financial markets; Forecasting; Investments; Learning systems; Regression analysis; Semantics; Classification system; Latent Semantic Analysis; Predictive information; Predictive models; Random forests; Ridge regression; Technical indicator; Text mining; Online systems
Página de inicio:43
Página de fin:56
Título revista:Journal of Computational Science
Título revista abreviado:J. Comput. Sci.


