InfoLab Logo Header

Feedback! (Posted by Paul Heymann)

Hello there. We've now been posting to the Stanford InfoBlog for a few months. We hope you've enjoyed the posts so far, but we'd like to know a little more about who you are and what you like in order to better serve you (and to get an idea of how our own research fits into the world of research outside of Stanford and outside of academia).

If you have a few minutes, we would really appreciate if you could fill out a survey here. We'll try not to give out any identifying details (not that any of the questions are that personal), and we'll use the responses to improve the InfoBlog. We might have a quick post at some later point to show what the results looked like.

Also, if you'd like, feel free to post any suggestions for the blog as comments to this post.

(P.S. The survey link above has a limit as to the number of responses, so if the survey is closed, don't worry! We may post another link, or just analyze the data at the point where we reach the limit.)

Labels: , , , ,

VLDB 2008 Trip Report (Posted by Ioannis Antonellis)

I recently came back from VLDB 2008 in Auckland, New Zealand where I gave a talk on my Simrank++ paper (see also my previous post, and Greg Linden's related post). The conference took place in the SkyCity Convention Center, located next to the Sky Tower, the southern hemisphere’s tallest tower. It was quite an experience, while heading to the conference and passing through the tower every morning, to watch people jumping from 320 meters high. Several conference participants were actually brave enough to claim that they were planning to take part in this or a different adventure activity in New Zealand (including our own Parag)...

In this post I plan to give an overview of the main keynote talk and the 10-year best paper award session talk as well as provide comments on a few research talks I attended. The slides from all presentations are already available from the VLDB website (here) and video recordings from all sessions will presumably be posted soon as well. Unfortunately, many of the papers are not yet available from VLDB, but I've tried to post links to the papers where they are available.

The main keynote

Mark Hill from University of Wisconsin-Madison gave the main keynote (slides): a teaching keynote on transactional memory. In his talk, entitled "Is Transactional Memory an Oxymoron" (notice the oxymoron: transactions are durable, memory is not), he gave a nice overview of transactional memory implementations via software and hardware and suggested how transactional memory can be used in database applications.

Mark Hill giving the main keynote

As he explained, DBMS transactions and transactional memory (TM) transactions differ in (a) their design goals, (b) their state, and as a result in (c) their implementation as well.
  • (a) DBMS transactions target mostly failures and then concurrency. The underlying assumption is that weird things can happen during a concurrent execution of transactions so there is need for all or nothing execution semantics. On the other hand TM transactions target only concurrency because their goal is to make parallel programming easier.
  • (b) The state for DBMS transactions consists of some durable storage (disk) and non-durable memory used as a cache for the disk. However, TM transactions are defined over the non-durable user-level memory. This explains why the title of the talk is not an oxymoron, as the non-durable memory is sensible for achieving the concurrency.
  • (c) Finally, both the differences in goals and the state have led to completely different implementations. According to TPC-C the best DBMSs achieve around a million transactions per minute per system whereas TM implementations execute a billion transactions per minute per core.
In summary, he argued that transactional memory will probably be more useful for new parallel applications than for DBMSs since the latter already use optimized latching strategies.

Mark Hill performing Maori dances
during the conference dinner


10-year best paper award

The paper "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces" by Stephen Blott, Hans Schek and Roger Weber from VLDB 1998 was the winner of the 10-year best paper award. In that paper, which currently has more than 700 citations, Stephen and his coauthors explain in a systematic way why the "curse of dimensionality" appears in studying nearest neighbor search problems on high dimensions. They analyzed the existing data structures, which were partitioning-based and showed formally that linear scanning of the database is more efficient for high dimensions. They also came up with a data structure that deals with the curse of dimensionality by approximating partitions of the space. Overall the talk was very pleasant and the material was clearly presented.

However, it was really unfortunate when Stephen seemed not to have followed the rich subsequent work on approximate answering of nearest neighbor queries. For example, a question by Surajit Chaudhuri about the relationship between hashing-based schemes like locality sensitive hashing (LSH) and the presented work went unanswered.

Experiments and Analysis track/Best paper awards

This year there was a new track with papers that try to reproduce and expand on previously published experimental results. One of those papers (Finding Frequent Items in Data Streams by Graham Cormode and Marios Hadjieleftheriou) was a co-winner of the Best Paper Award. The other best paper was on "Constrained Physical Design Turning" by my summer colleagues from Microsoft Research, Nico Bruno and Surajit Chaudhuri.

Nico Bruno presents Constrained Physical Design Tuning

Proceedings of VLDB

This year VLDB endowment announced the Journal track of the Proceedings of VLDB as an attempt to reduce the high review load. The vision of the VLDB endowment is a journal (JDMR) for short "conference style" papers with rapid turn-around, where authors submit papers only to the journal, there is a fixed review period and finally all database conferences will be able to select papers for presentation from the available pool of papers published in the journal. Many people expressed their skepticism on whether this will work, so it remains to be seen... Personally, I like the idea of waking up every day, knowing that I can submit my next VLDB paper today and even better knowing in three months from today that I will be visiting Lyon next summer. :)

Research talks

Parag gave a talk on "Scheduling Shared Scans of Large Data Files" and Anish presented the TRIO-related paper "Towards Special-Purpose Indexes and Statistics for Uncertain Data" in the MUD workshop.

In general the conference program was diverse enough, there were talks on traditional database subjects (theory, systems, XML databases, DB performance and evaluation, Distributed Systems Processing, Indexing, Data Integration, privacy), more recent 'trends' (Column store databases, Uncertain databases, Stream processing) as well as papers related to Web search, sponsored search, association rule mining, IR and text databases. Also, this year all papers had a 25 minutes slot for the talk; this enabled more papers to be accepted.

Alison Holoway gave a nice talk on her paper "Read-Optimized Databases, in Depth" (with David DeWitt). Following the big debate of C-store vs 'anti C-store' they are trying to come up with a more fair comparison between the two systems. In the paper, they focus on studying scans for compressed row and column store for different compression types, queries and table types. In the same session, Ioannis Koltsidas presented an interesting paper on combining flash memory with disks as a storage.

Google had two papers I found interesting, one on extracting structured data from Web tables and another one on surfacing the deep web. Yahoo! had (among many papers) a search-related paper on "Relaxation in Text Search Using Taxonomies" where they presented a document retrieval model that augments text queries with multidimensional taxonomy restrictions. Another interesting paper on text search looked at how a presumably untrusted text search engine can provide guarantees that its results do not favor specific documents. Also, Microsoft researchers presented SCOPE, a SQL-like scripting language for parallel processing of large datasets; Microsoft's version of PIG Latin from Yahoo! and Sawzall from Google.

Finally, in the same session as my talk, there was another interesting paper on Simrank by Dmitry Lizorkin who presented optimization techniques for computing Simrank scores efficiently in large graphs.

Alon Halevy gave a tutorial on (what else?) Dataspaces

That concludes my trip report from New Zealand. If you were in the conference and have any comments please do leave a comment!

Labels: , , , , , , , ,