Structural Metadata from ArXiv Articles
The data set contains metadata extracted from more than one million arXiv articles that were put online before the end of 2016.
Data Set Characteristics: Text
Number of Instances: 1107138 arXiv articles
Size: 566 megabytes, compressed Area: NLP and Machine Learning
Attribute Characteristics: String/Text
Associated Tasks: Classification and Clustering
Date Released: 2017-09-01
Source : arXiv repository
File format: JSON
The JSON file contains information 1,107,138 arXiv articles put online during or before 2016. Each of the top level keys in he JSON file is the arXiv article id. For each article, following information is given.
You can view some examples of the json objects here.
Relevant Paper: Muhammad Rahman and Tim Finin, "Understanding the Logical and Semantic Structure of Large Documents", University of Maryland, Baltimore County
For more information, please contact firstname.lastname@example.org
Authors: Muhammad Mahbubur Rahman
Date: September 01, 2017
Format: ZIP Compressed File (Need an extractor? Get one here)
Number of downloads: 24
Access Control: Publicly Available
Available for download as