TECHNO TECH: Binary heap

Binary heap

A binary heap forces a certain property onto an
array which makes it into what is known as a heap.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Using integer division the parent of a child is
p=c\2 and the children (if any) of a parent are
c1=p*2 and c2=(p*2)+1.

/¯¯¯¯¯¯¯¯¯¯\¯¯\
/¯\¯¯\ /¯¯¯¯¯/\¯¯\ /¯¯¯¯¯¯\¯¯\¯¯¯¯¯¯¯¯\¯¯\
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
\___/_\/ \___/__/________/__/
\_______/__/

If you understand this, you're set.

The elements of the array can be thought of as
lying in a tree structure. The tree structure is
purely notional; there are no pointers, etc.

Element indices: 1
/ \
2 3
/ \ / \
4 5 6 7
/ \ / \ / \ / \
8 9 10 11 12 13 14 15

An array A(1..n) is called a heap if the value
of each element is greater than or equal to the
values of its children, if any.

Clearly, if A(1..n) is a heap, then A(1) is the
largest element of the array.

Element values: W
/ \
U S
/ \ / \
K P J R
/ \ / \ / \ /
C H A F D B E

Given a new value "T" added to the heap at the
next available index A(15), the "sift" operation
moves the new element up the heap, moving smaller
values down, until the new element can be placed
in such a way as to maintain the heap property.

Element values: W
/ \
U [T]
/ \ / \
K P J S
/ \ / \ / \ / \
C H A F D B E R

The heap can have "height" at most log_2(n), so the
insertion operation takes at most O(log n) time.

Deletions from a heap are just as simple. To remove
the value "P" at A(5) just replace it with that of
the last element, then "sift" the new value in A(5)
up or down as needed:

Element values: W
/ \
U T
/ \ / \
K [R] J S
/ \ / \ / \ /
C H A F D B E

To sort this heap we continuously get the value of
the root (the element containing the greatest value)
and exchange it with that of the last element.

Element values: [E]
/ \
U T
/ \ / \
K R J S
/ \ / \ / \
C H A F D B |¯W¯¯¯

We then "sift" the root item down moving the greatest
child up at each step. Doing this will leave the heap
with the greatest value at the top.

Element values: U
/ \
R T
/ \ / \
K F J S
/ \ / \ / \
C H A [E] D B |¯W¯¯¯

Next, excluding the last element exchanged, repeat
by exchanging the value of the root with that of
the second-to-last element, etc.

Element values: [B]
/ \
R T
/ \ / \
K F J S
/ \ / \ /
C H A E D |¯U¯¯¯W¯¯¯

Becomes: T
/ \
R S
/ \ / \
K F J [B]
/ \ / \ /
C H A E D |¯U¯¯¯W¯¯¯

The test at element A(7) for children will
produce an index out of current bounds.

We continue until all the greater values are
at the bottom and the least values at the top.

Element values: A
/ \
|¯¯¯¯¯¯¯B¯¯¯¯¯¯¯¯¯¯¯¯¯¯C¯¯¯¯¯¯
| / \ / \
| D E F H
| / \ / \ / \ /
| J K R S T U W

We can see that due to it's binary nature, the
computing time for heap sort is O(n log_2 n).

From this sorted state, the heap property can be
re-established through "sifting" operations, and
also by an inversion of the array elements:

Element values: W
/ \
U T
/ \ / \
S R K J
/ \ / \ / \ /
H F E D C B A

The heap is now ready for further insertions
or deletions, and can be re-sorted.

In summary:

A binary heap guarantees logarithmic performance,
with no balancing overhead associated with binary
trees.

A binary heap can have height at most log_2(n),
so creating the heap takes at most O(log n) time.

Sorting the heap adds computing time for a total
of O(n log_2 n) time.

Each subsequent search operation takes O(log n) time.

Mathematical complexity

You may have heard of mathematical complexity measures
like O(n) and O(n2).

O(n) means that the execution time of a procedure is
proportional to the input size n.

O(n2) means it's proportional to the square of the
input size.

In other words, O(n) is much faster than O(n2), if
n is large.

What is n, the input size? It can be anything,
including the number of lines in a text file,
the dimensions of an array, or the size of
binary data.

It all depends on what you're programming.

The worst case is denoted by O(2n).

This means that execution time rises
exponentially when n increases.

On the other hand, if execution time rises only
logarithmically, like O(log n), you have a fast
procedure even for large sets of data.

So what does logarithmic really mean? For each
time n doubles it only adds one extra step in the
search operation, and this is the fundamental
benefit of a binary search.

A tree containing 16 nodes can be completely
searched with no more than 4 comparisons (2^4=16),
and a tree containing 256 nodes can be completely
searched with no more than 8 comparisons (2^8=256),
while a tree containing one million nodes in no
more than 20 comparisons, and 4,294,967,296 nodes
can be searched in no more than 32 comparisons.

TECHNO TECH

Friday, 23 May 2014

Binary heap

No comments:

Post a Comment

Cloud Numbered Labels