Analytics

Saturday, August 1, 2009

Why Hibernate does "delete all then re-insert" - its not so strange

Overview

Hibernate allows developers to interact with data from a relational database in an object-oriented fashion. This brings a lot of benefits to maintenance of development. However, hibernate is an abstraction of data which essentially is its function as an ORM, and this usually forces the developer to know hibernate's internals in order to use the tool effectively.

Maintaining collections in the object world is something every application does. Using hibernate you can manipulate colllections by adding and deleting items through the java api, and by way of mapping, hibernate takes care of generating the SQL to augment that data in the database. If the nuances of the mapping is not understood, hibernate may produce what seems like strange results. A particular instance arises when you map a List on the many side of a one to many relationship. When removing an item from the list, hibernate issues a delete statement to delete all elements, and then issue insert statements to insert everything back in except the one removed. This seems ineffecient and odd. Why not just delete the one item? Then if you change the collection type to a Set, only one delete statement is issued by itself. This article tries to explain this behavior and hopefully bring to light hibernate's internals.


Explanation by Example

The scenario described above occurs when mapping a one to many relationship with a association table. The association table stores two attributes, each of which is a foreign key to another table. In addition, the many side object is mapped with a List, but no index column is used. Hibernate recommends when using a list that an index column is used, but why exactly? The example should illuminate why hibernate recommends the index column when mapping a list. The example will show you what happens when using a list without an index column.

Example Setup

Let's set up the example. Let's say I have two classes: A and B. There is a one to many relationship from A to B, where A has many B's in a List. Using hibernate we map B with one to many and leave out the index column.

As instances of these classes, we have an A1 with 3 children and A9 with no children (just for contrasting sake). The figure below illustrates this


In the database, we have three tables. The association table maintains the relationship between A and B. No constraints are in the table for simplicity sake. (We want to focus on hibernate's behavior and worry less about enforcing integrity in the database.) Notice there is not indexed column in AB.


Here is the data in the tables:

Also notice that the list contains two B1's, as a list is capable of containing duplicates. So we code in java to delete an element from A, say the first element in the list which is B1.



a.getBList().remove(0);


So what occurs on hibernate's flush? It first issues an sql to delete all rows in the relationship table where B_ID = 1. Then it inserts all the relationships that are not supposed to be deleted!
This seems strange.
However the reason why this occurs is because hibernate supports the additional feature of a collection that is called a Bag. A bag is a collection which can have duplicates but is unordered. The best feature of a bag is that you can get the number of occurrences of an object through the API with
public int occurrences(Object o) 
With a list, there is no way to do the same without iterating through the whole list.

Thus, when a list is mapped without an indexed collection, hibernate treats in like a bag. In essence, a list which has an indexed column is an "indexed bag.

Since hibernate handles our list as a bag, it believes there could be duplicate entries in the association table. In addition it knows the items in table AB are unordered, so there is no way to tell which row should be deleted from the object in java. We asked earlier, why not just issue one delete statement. If it stopped here we would end up with:


So hibernate needs to continue and re-insert the rows it knows from java that are not supposed to be deleted. In memory, it can just look at the list after remove() was called and insert the remaining relationships. So we end up with what we wanted:



Conclusion

When using a list in hibernate an indexed column is necessary if you want efficient manipulation of adding and deleting items. With an indexed column, when you code a remove(index), hibernate can relate that index to the column in the db and even if there are duplicates, it can distinguish and delete one row. Better yet, if duplicates are not needed in your application, you can use a Set. A set requires no index column and hibernate can distinguish one row because it know a set does not have duplicates.

If you are wondering what is the purpose of a Bag, references below may answer that question.

References

http://www.discursive.com/books/cjcook/reference/collections-sect-bag.html
http://guanyu.info/blog/?p=69
https://forum.hibernate.org/viewtopic.php?t=957657
http://sites.google.com/site/xushengxiaotech/Home/hibernate-tips--efficient-usage-of-bag-for-association

6 comments:

  1. This is the sort of things cam make life harder for DBAs with ORM database practises. If there is a requirement to maintain an audit trail then it gets polluted by masses of Delete/Insert pairs that effect no change but make reporting on time based views of the data histories a real pain.

    ReplyDelete
  2. That is a good point. ORMs definitely make java developers' lives easier by sparing them lots of yucky JDBC and mapping code. Using objects allows for modularity, encapsulation, and the whole OO works. However, if we don't know the internals of the ORM tool, we can run into issues you noted above.

    We can alleviate the mass delete issue by using an indexed column. I would hope a DBA would be willing enough to add such a column to service the application. The tradeoff's must be understood and should be well worth it.

    ReplyDelete
  3. Thanks! I had the same Problem. It deleted all and then inserted. To solve this,

    1) I declared the Collection as Set instead of List
    2) Overridden the hashCode and equals method

    It selectively deleted based on the removal in collection.

    The following blog also helped to achieve selective update in one-many relationship

    http://sieze.wordpress.com/2009/09/04/mapping-a-many-to-many-join-table-with-extra-column-using-jpa/#comment-51

    ReplyDelete
  4. thanks nirav. this is d information i wanted. u explained in such a simple n diagrammatic approach... thanks a lot... cleared my doubts...

    ReplyDelete
  5. Very nice explication! The only source I found where the reason is explained in detail. Thx!

    ReplyDelete
  6. Hi,
    You have explained it in detail with an example, you have not mentioned anything about Set.

    ReplyDelete