AllBestEssays.com - All Best Essays, Term Papers and Book Report
Search

B-Trees Vs. Skip-Lists - Analysis and Rdbms Performance Improvement

Essay by   •  January 30, 2018  •  Research Paper  •  7,090 Words (29 Pages)  •  1,272 Views

Essay Preview: B-Trees Vs. Skip-Lists - Analysis and Rdbms Performance Improvement

Report this essay
Page 1 of 29

B-Trees vs. Skip-Lists:

Analysis and RDBMS Performance Improvement

Richard Rodriguez
Florida International University
rrodr420@fiu.edu

Qiulin Zhang
Florida International University
qzhan024@fiu.edu

Juan Wang
Florida International University
jwang090@fiu.edu

ABSTRACT                                

RDBMS performance improvement is a very thought-provoking topic between developers and the user community. Many situations where query performance needs to be improved. As data size grows, query performance needs to be improved; when data warehouses with millions or billions of rows to aggregate and summarize.        

B-Trees are the most widely used data structures in relational database management system.  Their evolution is presented as a collage of data structures, along with a brief analysis.  The highest variation of a B-Tree, a B+-Tree is also mentioned and the advantages, as well as the disadvantages of both B-Trees are shown.  Current research and advances in newer variations of B-Tree are then mentioned.

However, a lot of research and prototyping went into the decision to use the skiplist. Thus, it is necessary to demonstrate the rationale for this choice. By sticking to the characteristics of skiplist, three database applications which use this data structure as the underling implementation are mentioned. The memory management approaches for improving database execution efficiency are also included.

In this paper, we are going to summarize our research based on more than 100 reference papers and analysis and compare how the b-trees and skip-lists improve the database query and performance. Including the cons and pons along with the future work.                

KEYWORDS

B-trees; Skiplist; Lock-free; In-Memory Database; Query improvement

1 INTRODUCTION

1.1 B-TREES

We live in an era that is known as the data age.  Companies like Alphabet, Microsoft, and Amazon, to name a few, use data previously gathered to target our needs and incite us to buy products and/or services.  However, all this information must be stored somewhere.  When buying items/services online, the transactions and the data previous mentioned, which is now available in the order of petabytes and exabytes, require a lot of storage resources and database systems that can access that information efficiently. 

Relational Database Management Systems (RDBMS), which are based on tables of related table, have been the standard for years. A database can be implemented using nodes like those found on the Binary Search Tree (BST) or an AVL-Tree.  However, most RDBMSs are implemented using some form of a B-Tree.

B-Trees were invented by Rudolf Bayer and Ed McCreight while working at Boeing Research Labs in 1971.  It is a data structure that can accommodate data blocks, which make it efficient to transfer data to/from a storage device.  As a specialized type of data structure, it that allows for modular searching of indices, which ultimately point to the actual data.  

The data structure itself has pointers that can points to other parts of it, much like a linked list.  B-Trees is preferred over Binary Search Trees (BST) and AVL-Trees by far as it encompasses some the advantages of some of the previously mentioned data structures. Since it includes similarities, we can think of a B-Tree as a collage of data structures.  The transfer of blocks of data to and from the storage disk is maximized, which allows for higher concurrent access to the data, which in turn minimizes the user wait time.

Over the years a number of B-Tree (see Figure 2 and Table 1) variants have been invented.  Some of these variants are the B+-Trees (see Figure 3 and Table 2) and B*-Trees (among others).

1.2 SKIP-LISTS

Contrary to balanced trees with explicitly maintaining the balance, the skip lists use probabilistic balancing rather than strictly enforced balancing. As a result, skip lists are a more natural representation than trees. The seminal skip list paper was published by William Pugh in 1990, which makes it about 20 years younger than B-tree first proposed in 1970s. Though skip list is a relatively recent invention, it’s algorithms are much simpler and significantly faster than equivalent for B-trees meanwhile it’s expected cost of a search, insertion or deletion is O (log n).

The most shining feature about skiplist is the lock-free implementations which have recently been developed that provide thread safety with better parallelism under a concurrent read/write workload than thread safe balanced trees that require locking. The simplicity of the skiplist is what makes it well suited for a lockfree implementation. The algorithms for writing a thread safe skiplist lock-free are now a solved problem in academia.  A number of papers have been published on the subject in the past decade.  It’s much harder to make a lock-free skiplist perform well when there is low contention (ie., a single thread iterating over the entire skiplist with no other concurrent operations executing).  Optimizing this case is a more active area of research.

2   B-TREE ANALYSIS AND IMPROVEMENTS

B-Trees are an efficient data structure that is best suited for transferring blocks of data to and from a storage device.  Unlike a BST or an AVL data structure, where a node can only record a single value, a B-Tree allows for several values to be stored within the data block itself.  Most of these data blocks have data that is relevant to a part of a entity or a full entity.

They are also a data structure that allows a system’s CPU to be maximized.  On a mechanical storage device (see Figure 1) that spins at 7200 rotations per minute (RPM), the equivalent in a second will be 120 rotations per second, which equates to one revolution per second or 1/120s  8.3ms.  On an average server system, the 8.3ms access time translates to around the same time that a CPU is required to process at 100,000 instructions.[pic 1]

For benchmarking purposes, we will use an AVL tree for comparisons. If we can imagine an AVL-Tree with 10,000 nodes, saving the nodes from memory to storage would require us to do a recursive look up of each node.  Since each node is individually separated from the rest, each will have to be saved separately.  This will require having to write each of them separately into different files, which will incur a separate access time per node.  The total CPU cycles wastes will be equal to 10,000 nodes x 8.3 ms  83s.[pic 2]

...

...

Download as:   txt (45.8 Kb)   pdf (804.3 Kb)   docx (176 Kb)  
Continue for 28 more pages »
Only available on AllBestEssays.com