Seen on the Web: “Swoogle is an alien from outer space send out to spy on the modnation circuit. He got five faces so he can watch them from all angles without turning his head. However only his front shows many emotions. His right face is always angry, his left face is always in awe for some reason.”
Archive for the 'Swoogle' Category
3scale Networks is a Barcelona-based startup that is trying to fill a critical gap in helping organizations manage web services as a business or at least in a business-like manner.
“3scale provides a new generation of infrastructure for the web – point and click contract management, monitoring and billing for Web Services. The 3scale platform makes it easy for providers to launch their APIs, manage user access and, if desired, collect usage fees. Service users can discover services they need and sign up for plans on offer.” (source)
They have been operating a private beta system for a few months and just announced that their public beta is open. Currently signing up with 3scale and registering services is free and the only costs are commissions on transaction fees your service charges. Once you’ve registered a service, you can install one of several 3scale plugins for your programming environment to get your service talking to 3scale and configure one or more usage plans. 3scale uses Amazon’s EC2, S3 and Cloud Computing services.
3scale’s co-founder and technical lead is Steve Wilmott, who we worked with for many years when he was an academic doing research on multiagent systems. Several months ago he invited us to add Swoogle’s web service to 3scale’s private beta. We were please with how easy it was and look forward to exploring how else to use 3scale.
A story in yesterday’s Washington Post, Manage Your API Infrastructure With 3scale Networks, has some more information.
This offer just showed up in a Google alert triggered by its mention of Swoogle. Some poor Australian student (poor in ethics and ability, not money) is willing to pay $100 to have someone do his project for a Semantic Web course.
homeworkanytimehelp4 is behind on several assignments and in a bit of a fix. He needs his ontology assignment done by 12 October, just two days after he posted his offer.
Is this cheating? Well, the studentOfFortune.com site has thought deeply about this, and it turns out that it’s not.
Q: It still seems like cheating
A: We’ve thought long and hard about this. We believe that users who write solutions which not only help provide answers but also help teach how the answers were achieved will be the solutions that are purchased more often than not. And for that reason, we believe that Student of Fortune is a teaching and research tool, not a tool for cheating. But it’s up to you how you use it. We’re not going to judge you. We’re just here to help.
Times are hard right now. If you are tempted to help homeworkanytimehelp4, you owe it to yourself to find out if the dollars are USD or AUD.
I thought I would start blogging about our weekly ebiquity meetings, at least for those that might be of interest to people outside of our group. Our meetings are, in general, open and we are happy to have visitors, with or without warning. We meet on Monday mornings, from 10:30 to 11:30 or Noon, depending on the topic, in our department’s large conference room (325b ITE Building). This coming week (October 1) Joel Sachs will give us a tutorial on Linked Data. Here’s his abstract.
Linked Data refers to a collection of best practices for publishing data on the semantic web. It is also, in part, a re-branding of the semantic web itself, with less emphasis on semantics, and more on RDF linkages amongst data sources. Also heavily emphasized is the proper role of web architecture (http requests and responses; 303 redirects; etc.), and the distinction between information resources (those that physically reside on the web), and non-information resources (those that exist in the so-called real world). I’ll give a brief overview of Linked Data, followed by a discussion of some issues that Linked Data raises for the SPIRE project. These issues include how Swoogle should handle information sources such as DBpedia, and how to link ETHAN to other sources of taxonomic and natural history information.
Sometime today the UMBC Swoogle Semantic Web search engine discovered and indexed its millionth document. Of these, about 77% are valid RDF documents, 15% HTML documents with embedded RDF and 8% appear to be RDF documents but can not be parsed.
“Swoogle has indexed millions of Semantic Web Documents, but how do I know that mine has been indexed?” Here is a simple way – please try your URL using Swoogle Track Back Service. Here I list several example to show how it works:
- It helps us track the evolution of an ontology – say the protÃ©gÃ© ontology
- We may also check the growth of FOAF documents.
- Finally, this service may also help us learn the life cycle of a semantic web document: it was created, actively maintained, lingered around for a while and finally died (i.e. went offline).
About this URL
The latest ping on [2006-01-29] shows its status is [Succeed, changed into SWD].
Its latest cached original snapshot is [2006-01-29 (3373 bytes)]
Its latest cached NTriples snapshot is [2006-01-29 (41 triples)].
We have found 7 cached versions.
2006-01-29: Original Snapshot (3373 bytes), NTriples Snapshot (41 triples)
2005-08-25: Original Snapshot (3373 bytes), NTriples Snapshot (41 triples)
2005-07-16: Original Snapshot (2439 bytes), NTriples Snapshot (35 triples)
2005-05-20: Original Snapshot (2173 bytes), NTriples Snapshot (30 triples)
2005-04-10: Original Snapshot (1909 bytes), NTriples Snapshot (28 triples)
2005-02-25: Original Snapshot (1869 bytes), NTriples Snapshot (27 triples)
2005-01-24: Original Snapshot, NTriples Snapshot (31 triples)
About this URL
The latest ping on [2006-01-29] shows its status is [Succeed, changed into SWD].
Its latest cached original snapshot is [2006-01-29 (6072 bytes)]
Its latest cached NTriples snapshot is [2006-01-29 (98 triples)].
We have found 6 cached versions.
2006-01-29: Original Snapshot (6072 bytes), NTriples Snapshot (98 triples)
2005-07-16: Original Snapshot (6072 bytes), NTriples Snapshot (98 triples)
2005-06-19: Original Snapshot (5053 bytes), NTriples Snapshot (80 triples)
2005-04-17: Original Snapshot (3142 bytes), NTriples Snapshot (50 triples)
2005-04-01: Original Snapshot (1761 bytes), NTriples Snapshot (29 triples)
2005-01-24: Original Snapshot, NTriples Snapshot (29 triples)
About this URL
The latest ping on [2006-02-02] shows its status is [Failed, http code is not 200 (or406)].
Its latest cached original snapshot is [2005-03-09 (15809 bytes)]
Its latest cached NTriples snapshot is [2005-03-09 (149 triples)].
We have found 3 cached versions.
2005-03-09: Original Snapshot (15809 bytes), NTriples Snapshot (149 triples)
2005-02-25: Original Snapshot (12043 bytes), NTriples Snapshot (149 triples)
2005-01-26: Original Snapshot, NTriples Snapshot (145 triples)
NOTICE: Yesterday we posted a form that direct you to Swoogle trackback service. Unfortunately, the form failed when it was called outside our firewall because a Swoogle API key is required. We didnâ€™t notice at first, because we were inside the firewall when we tested it. When we did, we deleted the post, but PlanetRDF had already picked up the post and it was still in our database. Now the form has been removed, but you can definitely go to swoolge web site and try trackback service there.
Yesterday we posted directions on how to tell if your Semantic Web document is in Swoogle’s database. Unfortunately, our directions suggested using a service that, if called outside our firewall, requires a Swoogle API key. (This is seperate from being a registered Swoogle user.) We didn’t notice at first, because we were inside the firewall when we tested it. When we did, we deleted the post, but PlanetRDF had already picked up the post and it was still in our database. We’re working to straighten this out and hope to have the service available soon.
www.legaladvocate.net 246 26.14% www.myjavaserver.com 152 16.15% www.google.com 125 13.28% dannyayers.com 44 4.68% lucky7.to 34 3.61% ebiquity.umbc.edu 25 2.66% www.google.de 18 1.91% planetrdf.com 18 1.91% mail.google.com 18 1.91% groups.google.com 14 1.49%
One and five are clearly spam sites and two is suspicious, too. The first, for example, appears to be about poker, though the site name is legaladvocat. The site’s text is obviously automatically generated nonsense. All of the links point to subpages in the same domain with a similar structure and content. I assume that once the site achineves a high pageRank, it will be repurposed or sold.
So, it seems like nearly 50% of our hits are due to referer log spamming. I’d guess Swoogle was picked by finding its URL on recent posts found on a blog search engine or a ping server.
Foaf is a well-known semantic web practice on the Web, and we know that there are millions of FOAF instances on the Web. A scutter can help use to recursively find foaf documents online using hyperlinks in foaf documents; however, how to obtain the initial seeds is still a big issue.
In addition, many semantic web users would like to find out the population of ontology, e.g. the instances of a defined class such as foaf:Person, or where foaf:email has been populated as predicate.
Therefore, Swoogle provide an interesting interface supporting finding instances of a class such as foaf:Person.
This swoogle query searches the usage of a semantic web term, foaf:Person.
Its result consists of six exclusive categories:
- definesClass: the term has been defined as a class in the ontology
- definesProperty: the term has been defined as a property in the ontology
- populatesClass: there is a class-instance of that term in the document
- populatesProperty: the term has been used as a predicate (i.e. populated) in the document
- usesClass: the term has been used (neither defined or populated) as a class in an document. e.g. when an ontology asserts myns:Person rdfs:subClassOf foaf:Person.
- usesProperty: the term has been used as a property
Note that a document might have multiple usage relation with a term, e.g. a document both defines a term as a class, uses it to define other classes and properties, and populates its class-instances.
How to get there
In order to access that page, please follow the following steps:
- start from swoogle home page, choose “search term”
- type the localname or the entire URI surrounded by double qoute, and move to search result page
- click “metadata” link under your URI, and move to term’s metadata page
- click “related documents” link (a grey block) on the top of the page, and move to the wanted page
NOTE: advanced users may use swoogle web service APIs to retrieve more results.
We’ve set up a Google group, Swooglers, for users of the Swoogle Semantic Web search engine. Anyone can browse the archived and join, but only members can post messages. Replies are sent to the whole group. We’re not exactly sure what Swooglers will have to talk about, but it might be a place to share your experiences in using Swoogle, ask other users for advice, etc.
If you go to Swoogle on this Groundhog’s Day you will see a change. We’ve released a new version, Swoogle 2006, that is a nearly complete rewrite of Swoogle Classic, which now answers to Swoogle 2005. While Swoogle is currently missing some of Swoogle 2005’s features, it enjoys a cleaner and simpler model and foundation. We will be adding in some of these features as well as new ones over the next few months. Here are some of Swoogle 2006’s highlights:
- New hardware. Swoogle 2006 is running on a set of three machines: EB2 is a two processor Sun v20z with 4G of memory and runs the crawler, DBMS and development web interfaces; LOGOS is an IBM eserver runs the production web interfaces, and NATRAJ is the file server for the SW cache and archive.
- More data. Swoogle 2006 has over 850K documents in its index compared to Swoogle 2005’s 340K. The documents include about 700K RDF documents and 140K HTML documents with embedded RDF.
- Better ranking. Swoogle 2006 uses the improved ranking algorithms reported on in our ISWC 2005 paper.
- Better crawling. Swoogle 2006 now does a better job of crawling new URLs, including those submitted by people.
- Web services. Swoogle 2006 exposes a set of 17 web services, currently with simple GCI interfaces that return their results as RDF graph. Using the web services requires the use of a key, so we can track usage and possible abuses.
- RDF output. All query results, whether via a web service call or through the browser interface, are available in RDF. For browser-based queries, look for the RDF VERSION link in the upper left corner of the page.
- Simpler interface. The human web interface is simpler and cleaner.
- Cache and archive. Swoogle 2006 maintains a cache of the SW documents it finds and also keeps copies of older versions in it’s Semantic Web Archive .
- Registered user services. Swoogle 2006 has a better system for user accounts that includes a CAPCHA to keep out spambots. Anonymous users only see a limited number of query results where as registered users can see them all.
- Development wiki. We have a wiki for swoogle development ideas and discussion.
Some of the Swoogle 2005 features currently missing from Swoogle 2006 are the shopping cart and triple shop; the ontology dictionary; swoogle statistics and swoogle’s top ten. We plan to add these back into Swoogle 2006 over the next few months. Send any comments to swoogle-developers at ebiquity.umbc.edu.
Recently ClÃ¡udio Fernandes asked on several semantic web mailing lists
“Can someone point me to some huge owl/rdf files? I’m writing a owl parser with different tools, and I’d like to benchmark them all with some really really big files.”
I just ran some queries over Swoogle’s collection of 850K RDF documents collected from the web. Here are the 100 largest RDF documents and OWL documents, respectively. Document size was measured in terms of the number of triples. For this query, a document was considered to be an OWL document if it used a namespace that contained the string OWL.
Curently, the version of Swoogle you get by going to http://swoogle.umbc.edu/ is Swoogle 2. Its database has been trapped in amber since last summer, when it was corrupted, preventing us from adding new data. We put our efforts into a reimplementation, Swoogle 3, which will be released early next week. The data reported here is from Swoogle 3’s database.