I recently came back from
VLDB 2008 in Auckland, New Zealand where I gave a talk on my
Simrank++ paper (see also
my previous post, and
Greg Linden's related post). The conference took place in the SkyCity Convention Center, located next to the
Sky Tower, the southern hemisphere’s tallest tower. It was quite an experience, while heading to the conference and passing through the tower every morning, to watch people jumping from 320 meters high. Several conference participants were actually brave enough to claim that they were planning to take part in this or a different adventure activity in New Zealand (including our own
Parag)...
In this post I plan to give an overview of the main keynote talk and the 10-year best paper award session talk as well as provide comments on a few research talks I attended. The slides from all presentations are already available from the VLDB website (
here) and video recordings from all sessions will presumably be posted soon as well. Unfortunately, many of the papers are not yet available from VLDB, but I've tried to post links to the papers where they are available.
The main keynoteMark Hill from University of Wisconsin-Madison gave the main keynote (
slides): a teaching keynote on
transactional memory. In his talk, entitled "Is Transactional Memory an Oxymoron" (notice the oxymoron: transactions are durable, memory is not), he gave a nice overview of transactional memory implementations via software and hardware and suggested how transactional memory can be used in database applications.
Mark Hill giving the main keynote
As he explained, DBMS transactions and transactional memory (TM) transactions differ in
(a) their design goals,
(b) their state, and as a result in
(c) their implementation as well.
- (a) DBMS transactions target mostly failures and then concurrency. The underlying assumption is that weird things can happen during a concurrent execution of transactions so there is need for all or nothing execution semantics. On the other hand TM transactions target only concurrency because their goal is to make parallel programming easier.
- (b) The state for DBMS transactions consists of some durable storage (disk) and non-durable memory used as a cache for the disk. However, TM transactions are defined over the non-durable user-level memory. This explains why the title of the talk is not an oxymoron, as the non-durable memory is sensible for achieving the concurrency.
- (c) Finally, both the differences in goals and the state have led to completely different implementations. According to TPC-C the best DBMSs achieve around a million transactions per minute per system whereas TM implementations execute a billion transactions per minute per core.
In summary, he argued that transactional memory will probably be more useful for new parallel applications than for DBMSs since the latter already use optimized latching strategies.
Mark Hill performing Maori dances
during the conference dinner
10-year best paper awardThe paper
"A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces" by
Stephen Blott,
Hans Schek and Roger Weber from VLDB 1998 was the winner of the 10-year best paper award. In that paper, which currently has more than 700 citations, Stephen and his coauthors explain in a systematic way why the "curse of dimensionality" appears in studying nearest neighbor search problems on high dimensions. They analyzed the existing data structures, which were partitioning-based and showed formally that linear scanning of the database is more efficient for high dimensions. They also came up with a data structure that deals with the curse of dimensionality by approximating partitions of the space. Overall the talk was very pleasant and the material was clearly presented.
However, it was really unfortunate when Stephen seemed not to have followed the rich subsequent work on approximate answering of nearest neighbor queries. For example, a question by
Surajit Chaudhuri about the relationship between hashing-based schemes like
locality sensitive hashing (LSH) and the presented work went unanswered.
Experiments and Analysis track/Best paper awardsThis year there was a new track with papers that try to reproduce and expand on previously published experimental results. One of those papers (Finding Frequent Items in Data Streams by
Graham Cormode and
Marios Hadjieleftheriou) was a co-winner of the Best Paper Award. The other best paper was on "Constrained Physical Design Turning" by my summer colleagues from
Microsoft Research,
Nico Bruno and
Surajit Chaudhuri.
Nico Bruno presents Constrained Physical Design Tuning
Proceedings of VLDB
This year
VLDB endowment announced the
Journal track of the Proceedings of VLDB as an attempt to reduce the high review load. The vision of the VLDB endowment is a journal (
JDMR) for short "conference style" papers with rapid turn-around, where authors submit papers only to the journal, there is a fixed review period and finally all database conferences will be able to select papers for presentation from the available pool of papers published in the journal. Many people expressed their skepticism on whether this will work, so it remains to be seen... Personally, I like the idea of waking up every day, knowing that I can submit my next VLDB paper today and even better knowing in three months from today that I will be visiting Lyon next summer. :)
Research talksParag gave a talk on
"Scheduling Shared Scans of Large Data Files" and
Anish presented the
TRIO-related paper
"Towards Special-Purpose Indexes and Statistics for Uncertain Data" in the
MUD workshop.
In general the conference program was diverse enough, there were talks on traditional database subjects (theory, systems, XML databases, DB performance and evaluation, Distributed Systems Processing, Indexing, Data Integration, privacy), more recent 'trends' (Column store databases, Uncertain databases, Stream processing) as well as papers related to Web search, sponsored search, association rule mining, IR and text databases. Also, this year all papers had a 25 minutes slot for the talk; this enabled more papers to be accepted.
Alison Holoway gave a nice talk on her paper "
Read-Optimized Databases, in Depth" (with
David DeWitt). Following the big debate of C-store vs 'anti C-store' they are trying to come up with a more fair comparison between the two systems. In the paper, they focus on studying scans for compressed row and column store for different compression types, queries and table types. In the same session,
Ioannis Koltsidas presented an interesting
paper on combining flash memory with disks as a storage.
Google had two papers I found interesting, one on
extracting structured data from Web tables and another one on
surfacing the deep web. Yahoo! had (
among many papers) a search-related paper on
"Relaxation in Text Search Using Taxonomies" where they presented a document retrieval model that augments text queries with multidimensional taxonomy restrictions.
Another interesting paper on text search looked at how a presumably untrusted text search engine can provide guarantees that its results do not favor specific documents. Also, Microsoft researchers presented
SCOPE, a SQL-like scripting language for parallel processing of large datasets; Microsoft's version of
PIG Latin from Yahoo! and
Sawzall from Google.
Finally, in the same session as my talk, there was another interesting paper on
Simrank by
Dmitry Lizorkin who presented optimization techniques for computing Simrank scores efficiently in large graphs.
Alon Halevy gave a tutorial on (what else?) Dataspaces
That concludes my trip report from New Zealand. If you were in the conference and have any comments please do leave a comment!
Labels: antonell, antonellis, ioannis, trip, tripreport, vldb, vldb08, vldb2008, yannis