- Recent
- Popular
- Tags (2)
- Subscribers (21)
- Blending the link and query-click graphsYesterday
-
A fun paper out of Yahoo Research, "Dr. Searcher and Mr. Browser: A Unified Hyperlink-Click Graph" (PDF), looks at the value of combining two graphs that search engines typically use as part of static and dynamic ranking, the query-click bipartite graph (which shows what pages people click on immediately after searching) and the link graph (which shows hyperlinks between pages on the Web).
The query-click graph is a bipartite graph with queries on one side and clicked documents on the other. A query (e.g. [greg linden]) is linked to a document (e.g. http://glinden.blogspot.com) if people who make the query click on that document. The links are usually weighted by the probability that someone clicks on the document given that search query. Search engines get the query-click graph by parsing their query logs. Random walks of the query-click graph have been a popular research topic for finding similar queries and similar documents.
The hyperlink graph is a graph where web pages are nodes and a link from one page to another is a directed edge in the graph. Search engines get a link graph by crawling the Web and parsing all the html of all the pages. Random walks of the link graph are used to find related documents and by algorithms such as PageRank to compute the authority of web pages.
The authors of this Yahoo paper had the idea of combining the two graphs - Danny Sullivan on Microsoft's Live SearchDecember 30 2008
-
Danny Sullivan wrote up a version of a talk he gave at Microsoft in June 2008 in his recent post, "Tough Love for Microsoft Search".
Danny Sullivan is an insightful writer, long-time watcher of the search industry, and founder of Search Engine Watch, Search Engine Land, and the popular Search Engine Strategies (SES) conference. His thoughts are well worth reading.
[Found via Todd Bishop] - Considering consistency at AmazonDecember 29 2008
-
Amazon CTO Werner Vogels posted an copy of his recent ACM Queue article, "Eventually Consistent - Revisited". It is a nice overview of the trade-offs in large scale distributed databases and focuses on availability and consistency.
An extended excerpt:Database systems of the late '70s ... [tried] to achieve distribution transparency -- that is, to the user of the system it appears as if there is only one system instead of a number of collaborating systems. Many systems during this time took the approach that it was better to fail the complete system than to break this transparency.
In the mid-'90s, with the rise of larger Internet systems ... people began to consider the idea that availability was perhaps the most important property ... but they were struggling with what it should be traded off against. Eric Brewer ... presented the CAP theorem, which states that of three properties of shared-data systems -- data consistency, system availability, and tolerance to network partition -- only two can be achieved at any given time .... Relaxing consistency will allow the system to remain highly available under the partitionable conditions, whereas making consistency a priority means that under certain conditions the system will not be available.
If the system emphasizes consistency, the developer has to deal with the fact that the syste - The ugly complexities of browser securityDecember 15 2008
-
Googler Michal Zalewski wrote a "Browser Security Handbook" with a detailed look at many of the security issues in current browsers.
For me, the most interesting part was everything at and after the section "Life outside same origin rules".
As a teaser, here is just one of many examples that Michal discusses:[An] attacker may cleverly decorate portions of such a third-party UI to make it appear as if they belong to his site instead, and then trick his visitors into interacting with this mashup. If successful, clicks would be directed to the attacked domain, rather than attacker's page -- and may result in undesirable and unintentional actions being taken in the context of victim's account.
[For example,] the attacker may also opt for showing the entire UI of the targeted application in a large <IFRAME>, but then cover portions of this container with opaque <DIV> or <IFRAME> elements placed on top ... [Or] the attacker may simply opt for hiding the target UI underneath his own, and reveal it only miliseconds before the anticipated user click, not giving the victim enough time to notice the switch, or react in any way.Well worth reading the whole thing.
[Found via Philipp Lens - E-mail as the social networkDecember 9 2008
-
Om Malik writes that:Yahoo ... is planning to .... launch in beta relatively soon with half a dozen small applications running in a sidebar inside the Yahoo mail client (Evite is one of the services that is said to be building a nano-app for this new Yahoo Mail-as-a-platform). Users' address books would act as a social graph, essentially turning Yahoo Mail into the basis of a whole new social networking experience.
The only way for Yahoo or Google to challenge the social networking incumbents like Facebook [is] to leverage their email infrastructure ... With relationship buckets pre-defined by the address book, which contains everything from web-based addresses to geo-local data (physical address) to mobile numbers, email clients are already rich with the very data set that Facebook [has].I liked this idea back when Om talked about it last year and still like it now.
The address book is essentially a social network. Not only does it have friend relationships, but also we can determine the importance of the relationships, the weights of the social connections. Oddly, surprisingly little has been done with that information in e-mail clients.
Perhaps it is fear of breaking something that so many people use and depend on, but e-mail clients ha
