]>
foafPub is a dataset of information extracted from FOAF files collected during the Fall of 2004. The data represents 7118 foaf documents collected from 2044 sites (identified by their symbolic IP address). A total of 201,612 RDF triples with provenance information are included. The foaf files were selected from larger datasets described in [1] and [2] to represents a interesting and balanced selection of foaf documents. This dataset is distributed under the Creative Commons Attribution (v2.0) license.
The dataset is distributed as a zip file containing SQL commands to create three tables: dict_host, dict_url and triple_person. The sql commands were generated from the original mySQL database using the export command.
The dict_host table has the addresses of the 2044 distinct sites from which the data was collected along with a timestamp of when each site was visited. The dict_url table has some metadata about each of the 7118 foaf files, including the URL, time collected, time last modified, number of people described in the file, and the number of foaf related triples in the file and the total number of triples in the file. The triple_person database contains all of the triples harvested from the files along with their source URLs.
[1] Li Ding, Lina Zhou, Tim Finin, and Anupam Joshi, How the Semantic Web is Being Used:An Analysis of FOAF, Proceedings of the 38th International Conference on System Sciences, January 2005.
[2] Li Ding, Tim Finin, and Anupam Joshi, Analyzing Social Networks on the Semantic Web, IEEE Intelligent Systems, volume 9, number 1, January 2005.
]]>