Swoogle hits 1.5M Semantic Web documents

June 5th, 2006

Sometime last night the Swoogle crawler found its 1,500,000th unique Semantic Web document. These 1.5M documents comprise about 1M RDF documents, 350K documents with embedded RDF data and 150K documents that look like Semantic Web documents but are currently inaccessible or fail to parse properly. About 3000 new documents are discovered each day. We estimate that of the 1M RDF documents, about 1% (10K) are ontologies, as opposed to data, examples or test files. Swoogle is not processing RDFa content, microformat data, or PDF and JPEG documents. The crawler is also severely governing its crawl of some domains (e.g., livejournal.com) that have large numbers of FOAF and RSS documents to maintain a more balanced and interesting collection.