InfoLab Logo Header

« Home | Next » | Next » | Next » | Next » | Next » | Next » | Next » | Next » | Next » | Next »

VLDB 2008 Trip Report (Posted by Ioannis Antonellis)

I recently came back from VLDB 2008 in Auckland, New Zealand where I gave a talk on my Simrank++ paper (see also my previous post, and Greg Linden's related post). The conference took place in the SkyCity Convention Center, located next to the Sky Tower, the southern hemisphere’s tallest tower. It was quite an experience, while heading to the conference and passing through the tower every morning, to watch people jumping from 320 meters high. Several conference participants were actually brave enough to claim that they were planning to take part in this or a different adventure activity in New Zealand (including our own Parag)...

In this post I plan to give an overview of the main keynote talk and the 10-year best paper award session talk as well as provide comments on a few research talks I attended. The slides from all presentations are already available from the VLDB website (here) and video recordings from all sessions will presumably be posted soon as well. Unfortunately, many of the papers are not yet available from VLDB, but I've tried to post links to the papers where they are available.

The main keynote

Mark Hill from University of Wisconsin-Madison gave the main keynote (slides): a teaching keynote on transactional memory. In his talk, entitled "Is Transactional Memory an Oxymoron" (notice the oxymoron: transactions are durable, memory is not), he gave a nice overview of transactional memory implementations via software and hardware and suggested how transactional memory can be used in database applications.

Mark Hill giving the main keynote

As he explained, DBMS transactions and transactional memory (TM) transactions differ in (a) their design goals, (b) their state, and as a result in (c) their implementation as well.
  • (a) DBMS transactions target mostly failures and then concurrency. The underlying assumption is that weird things can happen during a concurrent execution of transactions so there is need for all or nothing execution semantics. On the other hand TM transactions target only concurrency because their goal is to make parallel programming easier.
  • (b) The state for DBMS transactions consists of some durable storage (disk) and non-durable memory used as a cache for the disk. However, TM transactions are defined over the non-durable user-level memory. This explains why the title of the talk is not an oxymoron, as the non-durable memory is sensible for achieving the concurrency.
  • (c) Finally, both the differences in goals and the state have led to completely different implementations. According to TPC-C the best DBMSs achieve around a million transactions per minute per system whereas TM implementations execute a billion transactions per minute per core.
In summary, he argued that transactional memory will probably be more useful for new parallel applications than for DBMSs since the latter already use optimized latching strategies.

Mark Hill performing Maori dances
during the conference dinner


10-year best paper award

The paper "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces" by Stephen Blott, Hans Schek and Roger Weber from VLDB 1998 was the winner of the 10-year best paper award. In that paper, which currently has more than 700 citations, Stephen and his coauthors explain in a systematic way why the "curse of dimensionality" appears in studying nearest neighbor search problems on high dimensions. They analyzed the existing data structures, which were partitioning-based and showed formally that linear scanning of the database is more efficient for high dimensions. They also came up with a data structure that deals with the curse of dimensionality by approximating partitions of the space. Overall the talk was very pleasant and the material was clearly presented.

However, it was really unfortunate when Stephen seemed not to have followed the rich subsequent work on approximate answering of nearest neighbor queries. For example, a question by Surajit Chaudhuri about the relationship between hashing-based schemes like locality sensitive hashing (LSH) and the presented work went unanswered.

Experiments and Analysis track/Best paper awards

This year there was a new track with papers that try to reproduce and expand on previously published experimental results. One of those papers (Finding Frequent Items in Data Streams by Graham Cormode and Marios Hadjieleftheriou) was a co-winner of the Best Paper Award. The other best paper was on "Constrained Physical Design Turning" by my summer colleagues from Microsoft Research, Nico Bruno and Surajit Chaudhuri.

Nico Bruno presents Constrained Physical Design Tuning

Proceedings of VLDB

This year VLDB endowment announced the Journal track of the Proceedings of VLDB as an attempt to reduce the high review load. The vision of the VLDB endowment is a journal (JDMR) for short "conference style" papers with rapid turn-around, where authors submit papers only to the journal, there is a fixed review period and finally all database conferences will be able to select papers for presentation from the available pool of papers published in the journal. Many people expressed their skepticism on whether this will work, so it remains to be seen... Personally, I like the idea of waking up every day, knowing that I can submit my next VLDB paper today and even better knowing in three months from today that I will be visiting Lyon next summer. :)

Research talks

Parag gave a talk on "Scheduling Shared Scans of Large Data Files" and Anish presented the TRIO-related paper "Towards Special-Purpose Indexes and Statistics for Uncertain Data" in the MUD workshop.

In general the conference program was diverse enough, there were talks on traditional database subjects (theory, systems, XML databases, DB performance and evaluation, Distributed Systems Processing, Indexing, Data Integration, privacy), more recent 'trends' (Column store databases, Uncertain databases, Stream processing) as well as papers related to Web search, sponsored search, association rule mining, IR and text databases. Also, this year all papers had a 25 minutes slot for the talk; this enabled more papers to be accepted.

Alison Holoway gave a nice talk on her paper "Read-Optimized Databases, in Depth" (with David DeWitt). Following the big debate of C-store vs 'anti C-store' they are trying to come up with a more fair comparison between the two systems. In the paper, they focus on studying scans for compressed row and column store for different compression types, queries and table types. In the same session, Ioannis Koltsidas presented an interesting paper on combining flash memory with disks as a storage.

Google had two papers I found interesting, one on extracting structured data from Web tables and another one on surfacing the deep web. Yahoo! had (among many papers) a search-related paper on "Relaxation in Text Search Using Taxonomies" where they presented a document retrieval model that augments text queries with multidimensional taxonomy restrictions. Another interesting paper on text search looked at how a presumably untrusted text search engine can provide guarantees that its results do not favor specific documents. Also, Microsoft researchers presented SCOPE, a SQL-like scripting language for parallel processing of large datasets; Microsoft's version of PIG Latin from Yahoo! and Sawzall from Google.

Finally, in the same session as my talk, there was another interesting paper on Simrank by Dmitry Lizorkin who presented optimization techniques for computing Simrank scores efficiently in large graphs.

Alon Halevy gave a tutorial on (what else?) Dataspaces

That concludes my trip report from New Zealand. If you were in the conference and have any comments please do leave a comment!

Labels: , , , , , , , ,

  1. Anonymous Anonymous | September 17, 2008 at 6:34 PM |  

    Hey Ioannis,

    SCOPE also closely resembles the query language component of Facebook's Hive: https://issues.apache.org/jira/browse/HADOOP-3601.

    Regards,
    Jeff

  2. Blogger Ioannis Antonellis | September 18, 2008 at 8:36 AM |  

    Hi Jeff,

    It is interesting to see so many alternative languages becoming available...

    And i am happy to see that Facebook Hive and Pig are opensource and available to everyone.

    yannis

  3. Blogger Vasilis | September 23, 2008 at 9:15 AM |  

    Very interesting description and useful for those of us who didn't make it to down under.

    It would be better if the links worked though :-) (I tried three unsuccessfully)

  4. Blogger Paul Heymann | September 23, 2008 at 4:35 PM |  

    Hey Vasilis:

    Which links are broken for you? I think a few links had tilde url encoded for some reason, but they seem to still work for me. Didn't see anything when I ran the page through linkchecker, but I want to make sure our stuff isn't broken. ;-)

    Paul

  5. Blogger Vasilis | September 24, 2008 at 7:16 AM |  

    Hmm. I remember I tried the links of the two Google papers and got nowhere. I forget which was the third one. But when I try the links today, they all work fine.
    (on both IE and Firefox)

  6. Blogger Paul Heymann | September 24, 2008 at 10:25 AM |  

    Ah, how strange. I guess if it happens again we'll try to track it down. Also, once VLDB gets around to releasing the proceedings, we won't need to link to all sorts of random places. ;-)

  7. Blogger Ioannis Antonellis | September 24, 2008 at 10:28 AM |  

    It seems to me that if VLDB needs a month to post the proceedings, then the new journal track with guaranteed review cycles of 3 months will be a total failure...

leave a response