VLDB 2008 Trip Report (Posted by Ioannis Antonellis)
In this post I plan to give an overview of the main keynote talk and the 10-year best paper award session talk as well as provide comments on a few research talks I attended. The slides from all presentations are already available from the VLDB website (here) and video recordings from all sessions will presumably be posted soon as well. Unfortunately, many of the papers are not yet available from VLDB, but I've tried to post links to the papers where they are available.
The main keynote
Mark Hill from University of Wisconsin-Madison gave the main keynote (slides): a teaching keynote on transactional memory. In his talk, entitled "Is Transactional Memory an Oxymoron" (notice the oxymoron: transactions are durable, memory is not), he gave a nice overview of transactional memory implementations via software and hardware and suggested how transactional memory can be used in database applications.
As he explained, DBMS transactions and transactional memory (TM) transactions differ in (a) their design goals, (b) their state, and as a result in (c) their implementation as well.
- (a) DBMS transactions target mostly failures and then concurrency. The underlying assumption is that weird things can happen during a concurrent execution of transactions so there is need for all or nothing execution semantics. On the other hand TM transactions target only concurrency because their goal is to make parallel programming easier.
- (b) The state for DBMS transactions consists of some durable storage (disk) and non-durable memory used as a cache for the disk. However, TM transactions are defined over the non-durable user-level memory. This explains why the title of the talk is not an oxymoron, as the non-durable memory is sensible for achieving the concurrency.
- (c) Finally, both the differences in goals and the state have led to completely different implementations. According to TPC-C the best DBMSs achieve around a million transactions per minute per system whereas TM implementations execute a billion transactions per minute per core.
during the conference dinner
10-year best paper award
The paper "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces" by Stephen Blott, Hans Schek and Roger Weber from VLDB 1998 was the winner of the 10-year best paper award. In that paper, which currently has more than 700 citations, Stephen and his coauthors explain in a systematic way why the "curse of dimensionality" appears in studying nearest neighbor search problems on high dimensions. They analyzed the existing data structures, which were partitioning-based and showed formally that linear scanning of the database is more efficient for high dimensions. They also came up with a data structure that deals with the curse of dimensionality by approximating partitions of the space. Overall the talk was very pleasant and the material was clearly presented.
However, it was really unfortunate when Stephen seemed not to have followed the rich subsequent work on approximate answering of nearest neighbor queries. For example, a question by Surajit Chaudhuri about the relationship between hashing-based schemes like locality sensitive hashing (LSH) and the presented work went unanswered.
Experiments and Analysis track/Best paper awards
This year there was a new track with papers that try to reproduce and expand on previously published experimental results. One of those papers (Finding Frequent Items in Data Streams by Graham Cormode and Marios Hadjieleftheriou) was a co-winner of the Best Paper Award. The other best paper was on "Constrained Physical Design Turning" by my summer colleagues from Microsoft Research, Nico Bruno and Surajit Chaudhuri.
Proceedings of VLDB
This year VLDB endowment announced the Journal track of the Proceedings of VLDB as an attempt to reduce the high review load. The vision of the VLDB endowment is a journal (JDMR) for short "conference style" papers with rapid turn-around, where authors submit papers only to the journal, there is a fixed review period and finally all database conferences will be able to select papers for presentation from the available pool of papers published in the journal. Many people expressed their skepticism on whether this will work, so it remains to be seen... Personally, I like the idea of waking up every day, knowing that I can submit my next VLDB paper today and even better knowing in three months from today that I will be visiting Lyon next summer. :)
Parag gave a talk on "Scheduling Shared Scans of Large Data Files" and Anish presented the TRIO-related paper "Towards Special-Purpose Indexes and Statistics for Uncertain Data" in the MUD workshop.
In general the conference program was diverse enough, there were talks on traditional database subjects (theory, systems, XML databases, DB performance and evaluation, Distributed Systems Processing, Indexing, Data Integration, privacy), more recent 'trends' (Column store databases, Uncertain databases, Stream processing) as well as papers related to Web search, sponsored search, association rule mining, IR and text databases. Also, this year all papers had a 25 minutes slot for the talk; this enabled more papers to be accepted.
Alison Holoway gave a nice talk on her paper "Read-Optimized Databases, in Depth" (with David DeWitt). Following the big debate of C-store vs 'anti C-store' they are trying to come up with a more fair comparison between the two systems. In the paper, they focus on studying scans for compressed row and column store for different compression types, queries and table types. In the same session, Ioannis Koltsidas presented an interesting paper on combining flash memory with disks as a storage.
Google had two papers I found interesting, one on extracting structured data from Web tables and another one on surfacing the deep web. Yahoo! had (among many papers) a search-related paper on "Relaxation in Text Search Using Taxonomies" where they presented a document retrieval model that augments text queries with multidimensional taxonomy restrictions. Another interesting paper on text search looked at how a presumably untrusted text search engine can provide guarantees that its results do not favor specific documents. Also, Microsoft researchers presented SCOPE, a SQL-like scripting language for parallel processing of large datasets; Microsoft's version of PIG Latin from Yahoo! and Sawzall from Google.
Finally, in the same session as my talk, there was another interesting paper on Simrank by Dmitry Lizorkin who presented optimization techniques for computing Simrank scores efficiently in large graphs.
That concludes my trip report from New Zealand. If you were in the conference and have any comments please do leave a comment!