InfoLab Logo Header

« Home | Next » | Next » | Next » | Next » | Next » | Next » | Next » | Next » | Next » | Next »

Generic Entity Resolution with Negative Rules (Posted by Steven Whang)

Entity Resolution (ER) is a process of identifying records that refer to the same real-world entity and merging them together. For example, two companies that merge may want to combine their customer records: for a given customer that dealt with the two companies they create a composite record that combines the known information.

However, the process for matching and merging records is most often application-specific, complex, and error-prone. The input records may contain ambiguous and not-fully specified data, and it may be impossible to capture all the application nuances and subtleties in whatever logic is used to decide when records match and how they should be merged. Thus, the set of resolved records (after ER) may contain "errors" that may be apparent to a domain specialist. For example, we may have a customer record with an address in a country we do not do business with. Or two different company records where the expert happens to know that one company recently required the other so they are now the same entity.

In our paper, we address the identification and handling of such inconsistencies using negative rules, which are predicates that take a number of records and return whether the records are consistent or not. For example, if ER mistakenly merges two person records with different genders, we can specify a binary negative rule that flags an inconsistency for any record that has both genders. The negative rules are then used with the original ER rules to find a consistent ER solution.

There are two reasons why we cannot simply modify the ER rules instead of using the negative rules. First, the negative rules are only used to check the consistency of the final ER result and not the consistency of the intermediate records that are created during the ER process. For example, a record that has both genders may turn out to be Female (based on more evidence) at the end of the ER process (and thus consistent). Second, the negative rules can be viewed as an effort of a second party to fix the errors made by the ER rules of the first party.

We propose a general algorithm that finds a consistent ER solution and an enhanced algorithm that is more efficient than the general algorithm, but assumes certain properties on the negative and ER rules. Our main findings are as follows:
  • Negative rules can significantly improve the accuracy of the ER process. Depending on the strategy, a domain expert may need to help resolve the inconsistencies.
  • Different combinations of negative rules may result in different accuracy and runtime results. The negative rules themselves also need to be correct and effectively pinpoint the errors of the ER rules.
  • Applying negative rules is an expensive process (at least quadratic) that should not be run on the entire dataset if the dataset is very large. A common technique in ER is to use blocking techniques (many variations exist) where the entire dataset is divided into smaller blocks, and the blocks are processed one at a time. The negative rules can be used within each block.
  • Even without the properties, the enhanced algorithm can efficiently produce results that are nearly identical to those of the general algorithm.

  1. Anonymous Pankaj Mehra | May 15, 2009 10:40 AM |  

    Steven, This is one of the most formal treatments of Named Entity Resolution in literature. Thanks for doing this great work. Looking forward to learn about your work with DavidM on evaluating NER schemes later today.

  2. Anonymous Michael Calhoun | May 29, 2009 2:08 PM |  

    Will there be any postings online about the NER schemes evaluations?

  3. Anonymous Richard | September 7, 2009 8:46 PM |  

    I found your blog on google and read a few of your other posts. I just added you to my Google News Reader. Keep up the good work. Look forward to reading more from you in the future.

    Mengembalikan Jati Diri Bangsa | Kenali dan Kunjungi Objek Wisata di Pandeglang

  4. Anonymous Catatan Blogger | September 15, 2009 8:09 PM |  

    Wonderful post please keep us updating on these kind of issues.. Thanks in advance.
    Objek Wisata di Pandeglang | Kenali dan Kunjungi Objek Wisata di pandeglang

  5. Anonymous Smith | September 26, 2009 10:15 PM |  

    Nice post, I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I'll be subscribing to your feed and I hope you post again soon.

    if you do not mind, please visit my article related to pandeglang district in Banten, Indonesia at Kenali dan Kunjungi Objek Wisata di Pandeglang and also related to a leadership at Mengembalikan Jati Diri Bangsa

    Oes Tsetnoc | Oes Tsetnoc

  6. Anonymous Jack | October 4, 2009 12:32 AM |  

    I found your blog on google and read a few of your other posts. I just added you to my Google News Reader. Keep up the good work. Look forward to reading more from you in the future.

    iklan baris gratis | jaringan iklan gratis baris | iklan baris gratis | pasang iklan baris gratis | submit iklan baris gratis | media pasang iklan gratis | promosi gratis iklan baris gratis | iklan baris gratis | pasang iklan baris gratis

  7. Anonymous Anonymous | October 7, 2009 6:54 AM |  

    thank you for share biofir kalung kesehatan | kalung biofir | bisnis pulsa murah | Obat diabetes | gamat jelly | bisnis pulsa elektrik

    kenali dan kunjungi objek wisata di pandeglang | bisnis pulsa eranet
    pulsa elektronik murah | pulsa elektrik termurah | bisnis pulsa termurah

  8. Anonymous Blog SEO | October 8, 2009 7:35 AM |  

    I personally like your post. It is very good to know that you don’t know. Fantastic post! Keep posting your good work.

    Kenali dan Kunjungi Objek Wisata di Pandeglang | Blog SEO | cah bagoes | oes tsetnoc

  9. Blogger Mizwar Smith | October 12, 2009 8:45 PM |  

    Thanks ever so much, very useful article. Great information!

    Pendatang Baru Kenali dan Kunjungi Objek Wisata di Pandeglang, Kembali Optimasi Kenali dan Kunjungi Objek Wisata di Pandeglang, Kenali dan Kunjungi Objek Wisata di Pandeglang Persaingan Semakin Sengit, Kenali dan Kunjungi Objek Wisata di Pandeglang, Optimasi Spam, Bolehkah?, Kenali dan Kunjungi Objek Wisata di Pandeglang SERP Baru, Kenali dan Kunjungi Objek Wisata di Pandeglang Turun Naik, Google “Ngedance” Pada Kenali dan Kunjungi Objek Wisata di Pandeglang, Kenali dan Kunjungi Objek Wisata di Pandeglang Masuk Halaman Pertama, Mencari Backlink dari .edu dan .gov, Masihkah Perlu?, Pandeglang, Banten – Eksotisme Pantai Tanjung Lesung, Pandeglang, Banten – Taman Nasional Ujung Kulon, Kenali dan Kunjungi Objek Wisata di Pandeglang

  10. Anonymous kenali dan kunjungi objek wisata di pandeglang | October 17, 2009 11:01 AM |  

    mantap

  11. Blogger bitheads | October 29, 2009 12:11 AM |  

    Nice story . It's kind of funny and entertaining.Thank you for  sharing. the best place kenali dan kunjungi objek wisata di pandeglang to your vacation, i promise that's

leave a response