Data Structure





UNIT 1


Introduction:  Algorithmic notationProgramming principles – Creating programs- Analyzing programs. 
Arrays: One dimensional array, multidimensional array, pointer arrays.
Searching: Linear search, Binary Search, Fibonacci search.


AN INTRODUCTION TO DATA STRUCTURE
Data structure is the structural representation of logical relationships between elements
of data. In other words a data structure is a way of organizing data items by considering
its relationship to each other.
Data structure mainly specifies the structured organization of data, by providing
accessing methods with correct degree of associativity. Data structure affects the design
of both the structural and functional aspects of a program.
Algorithm + Data Structure = Program
Data structures are the building blocks of a program

INTRODUCTION:
DATA:
            Data is a collection of facts or set of elements.
DATA STRUCTURES:
            Data structures means organization of data or storing of data.
PROGRAM:
            Each program consists of data structures and algorithm.
Algorithm:
            Algorithm is a step by step procedure to solve a task.
            Developing a program for an algorithm first we should select data structures.

Type of data structures:





Primary data structures:
            Primary data structure is a basic data structure. They have different representations or different computers.
            The basic constants such as integer, character and string constant.
Secondary data structures:
            Secondary data structure is derived from the basic data structure.
            It is classified as two types.
1.       Static data structures
2.       Dynamic data structures
1.       Static data structure:
            The data structure is created using static memory allocation is called static data structure or fixed data structure. In static data structure, the no of data items are known in advance.
E.g., Array.
2.       Dynamic data structure:
            The data structure is created using dynamic memory allocation is called dynamic data structure or variable size data structures. In Dynamic data structure, the no of data items are not known in advance.
E.g., linked list.
            It is classified as two types.
1.       Linear data structure
2.       Non linear data structure
Linear data structure:
            Linear data structure means the linear relationship with its adjacent elements.
E.g., linked list.
Non-Linear data structure:
            Non-Linear data structure means there is no linear relationship with its adjacent elements.
E.g., trees, graph.


Algorithm notations:
«  Algorithm is a step by step procedure to perform a specific task.
«  Algorithm is made up of sequence of clear instructions.
Properties of algorithm:
1.       The algorithm takes zero or more inputs.
2.       The algorithm produces zero or more outputs.
3.       The algorithm should be very efficient.
4.       The good algorithm occupies less memory space as much as possible.
5.       All the operations can be carried out with a finite time.
6.       The good algorithm can be easily understandable by non-programmer.
ALGORITHM TO FIND THE GREATEST NUMBER:
GREAT (A, N)
1.       [TO CHECK ARRAY IS EMPTY]
                        IF(N<0), THEN
                        WRITE(“ARRAY EMPTY”)
                        EXIT
2.       [INITIALIZE]
MAX←A[0]
I←1
3.       [TO EXAMINE ALL VALUES]
REPEAT THROUGH STEP 5 WHILE I<N
4.       [TO CHANGE THE VALUE OF MAX]
IF(MAX<A[I]) THEN
MAX←A[I]
5.       [TO EXAMINE NEXT VALUE]
I←I+1
6.       [FINISHED]
EXIT
1.       Name of the algorithm:
Each algorithm is given by a name.
For e.g., in the above algorithm GREAT is the name of the algorithm.
2.       Introductive command:
The algorithm name is followed by brief description about the algorithm. It indicates the number of variables used in the algorithm
E.g.,     GREAT(A,N)
Ø  GREAT - name of the algorithm
Ø  A - name of the array
Ø  N - number of elements
3. Steps
1.       Algorithm is made up of sequence of steps.
2.       Each step is enclosed within square bracket.
3.       It gives the brief description about the step.
4.Comment:
The comment is enclosed with open and closed parenthesis. It specifies no action takes place.
Statements and control structures:
1.       Assignment statement:
The assignment statement is indicated by the symbol ← between r.h.s and l.h.s variable
For eg.,
1.       MAX←A[0]
2.       I←I+1
3.       A←B
4.       I←J←K←0
Statement 1 indicates the value of A[0] is assigned to MAX
Statement 2 is called increment statement.
Statement 3 is called exchange statement. The value of B is assigned to A
Statement 4 is called multiple statements. The value of 0 is assigned to I, J and K
2.       If statement:
IF(CONDITION)
THEN
STATEMENTS;
            First the condition is checked, if it is true, then the statements after ‘THEN’ are executed.
3.       If else statement
IF(CONDITION)
   THEN
       STATEMENTS;
   ELSE
      STATEMENTS;
Ø  First condition is checked, if it true, then the statements after ‘THEN’ are executed.
Ø  If it is false, then else statements are executed.
4.       Select case statement
SELECT CASE(EXPRESSION)
CASE VALUE 1:
                STATEMENTS;
CASE VALUE 2:
                STATEMENTS;
.
.
.
CASE VALUE N:
                STATEMENTS;
DEFAULT:
                STATEMENTS;
Ø  Select case statement is a multiple choice statement.
Ø  The value of expression is matched with all case value.
Ø  When matching occurs, the corresponding case statements are executed.
Ø  When no matching occurs, the default statements are executed.
5.       Go-to statement
Go-to statement is used to transfer the control from one part of the program to another.
GOTO VARIABLE NAME;
.
.
.
VARIABLE NAME:
STATEMENTS;
6.       Exit statement:
Exit statement is used for immediate termination of the control.
IF(I<10)
EXIT
7.       Repeat statement:
Repeat statement is mostly used in looping.
The general form of repeat statement is

                                 i.            REPEAT FOR INDEX=SEQUENCE
Where index = valid variable name
Sequence = representation of sequence of nos.
This statement is used to execute the steps for the counted no. of times.
For eg., REPEAT FOR I=0,1,2,…….N
                               ii.            REPEAT WHILE LOGICAL EXPRESSION
This statement is used to execute the steps until the condition false.
Eg., REPEAT WHILE TRUE
                READ A[I]
                              iii.            REPEAT FOR INDEX=SEQUENCE WHILE LOGICAL EXPRESSION
This statement is the combination of i and ii
This statement is used to execute the steps for a counted no. of items until the expression fails.
Variable name:
            Variable is the name of the memory area which holds the value or data.
Rules for naming the variable:
Ø  First character should be a alphabet or underscore(_)
Ø  Special characters are not allowed.
Ø  It should not be as a digit.
Ø  It should not be as a keyword.
Ø  Both uppercase and lowercase are permitted. But they can be treated as different.
Ø  Blank spaces are not allowed.
Ø  There is no limit for identifier length.
CREATE PROGRAMS

 You should be capable of developing your programs using something better than the seat-of-the-pants method. This method uses the philosophy: write something down and then try to get it working. Surprisingly, this method is in wide use today, with the result that an average programmer on an average job turns out only between five to ten lines of correct code per day. We hope your productivity will be greater. But to improve requires that you apply some discipline to the process of creating programs. To understand this process better, we consider it as broken up into five phases: requirements, design, analysis, coding, and verification.

(i) Requirements. Make sure you understand the information you are given (the input) and what results you are to produce (the output). Try to write down a rigorous description of the input and output which covers all cases.
You are now ready to proceed to the design phase. Designing an algorithm is a task which can be done independently of the programming language you eventually plan to use. In fact, this is desirable because it means you can postpone questions concerning how to represent your data and what a particular statement looks like and concentrate on the order of processing.
(ii) Design. You may have several data objects (such as a maze, a polynomial, or a list of names). For each object there will be some basic operations to perform on it (such as print the maze, add two polynomials, or find a name in the list). Assume that these operations already exist in the form of procedures and write an algorithm which solves the problem according to the requirements. Use a notation which is natural to the way you wish to describe the order of processing.
(iii) Analysis. Can you think of another algorithm? If so, write it down. Next, try to compare these two methods. It may already be possible to tell if one will be more desirable than the other. If you can't distinguish between the two, choose one to work on for now and we will return to the second version later.
(iv) Refinement and coding. You must now choose representations for your data objects (a maze as a two dimensional array of zeros and ones, a polynomial as a one dimensional array of degree and coefficients, a list of names possibly as an array) and write algorithms for each of the operations on these objects. The order in which you do this may be crucial, because once you choose a representation, the resulting algorithms may be inefficient. Modern pedagogy suggests that all processing which is independent of the data representation be written out first. By postponing the choice of how the data is stored we can try to isolate what operations depend upon the choice of data representation. You should consider alternatives, note them down and review them later. Finally you produce a complete version of your first program
 (v) Verification. Verification consists of three distinct aspects: program proving, testing and debugging. Each of these is an art in itself. Before executing your program you should attempt to prove it is correct. Proofs about programs are really no different from any other kinds of proofs, only the subject matter is different. If a correct proof can be obtained, then one is assured that for all possible combinations of inputs, the program and its specification agree. Testing is the art of creating sample data upon which to run your program. If the program fails to respond correctly then debugging is needed to determine what went wrong and how to correct it. One proof tells us more than any finite amount of testing, but proofs can be hard to obtain. Many times during the proving process errors are discovered in the code. The proof can't be completed until these are changed. This is another use of program proving, namely as a methodology for discovering errors. Finally there may be tools available at your computing center to aid in the testing process. One such tool instruments your source code and then tells you for every data set: (i) the number of times a statement was executed, (ii) the number of times a branch was taken, (iii) the smallest and largest values of all variables. As a minimal requirement, the test data you construct should force every statement to execute and every condition to assume the value true and false at least once.

The previous discussion applies to the construction of a single procedure as well as to the writing of a large software system. Let us concentrate for a while on the question of developing a single procedure which solves a specific task. This shifts our emphasis away from the management and integration of the various procedures to the disciplined formulation of a single, reasonably small and well-defined task. The design process consists essentially of taking a proposed solution and successively refining it until an executable program is achieved. The initial solution may be expressed in English or some form of mathematical notation. At this level the formulation is said to be abstract because it contains no details regarding how the objects will be represented and manipulated in a computer. If possible the designer attempts to partition the solution into logical subtasks. Each subtask is similarly decomposed until all tasks are expressed within a programming language. This method of design is called the top-down approach. Inversely, the designer might choose to solve different parts of the problem directly in his programming language and then combine these pieces into a complete program. This is referred to as the bottom-up approach. Experience suggests that the top-down approach should be followed when creating a program. However, in practice it is not necessary to unswervingly follow the method. A look ahead to problems which may arise later is often useful.

ANALYZE PROGRAMS


(ii) Does it work correctly according to the original specifications of the task?
(iii) Is there documentation which describes how to use it and how it works?
(iv) Are subroutines created in such a way that they perform logical sub-functions?
(v) Is the code readable?
The above criteria are all vitally important when it comes to writing software, most especially for large systems. Though we will not be discussing how to reach these goals, we will try to achieve them throughout this book with the programs we write. Hopefully this more subtle approach will gradually infect your own program writing habits so that you will automatically strive to achieve these goals.
There are other criteria for judging programs which have a more direct relationship to performance. These have to do with computing time and storage requirements of the algorithms. Performance evaluation can be loosely divided into 2 major phases: (a) a priori estimates and (b) a posteriori testing. Both of these are equally important.
First consider a priori estimation. Suppose that somewhere in one of your programs is the statement
x ßx + 1.
We would like to determine two numbers for this statement. The first is the amount of time a single execution will take; the second is the number of times it is executed. The product of these numbers will be the total time taken by this statement. The second statistic is called the frequency count, and this may vary from data set to data set. One of the hardest tasks in estimating frequency counts is to choose adequate samples of data. It is impossible to determine exactly how much time it takes to execute any command unless we have the following information:
(i) the machine we are executing on:
(ii) its machine language instruction set;
(iii) the time required by each machine instruction;
(iv) the translation a compiler will make from the source to the machine language.
It is possible to determine these figures by choosing a real machine and an existing compiler. Another approach would be to define a hypothetical machine (with imaginary execution times), but make the times reasonably close to those of existing hardware so that resulting figures would be representative. Neither of these alternatives seems attractive. In both cases the exact times we would determine would not apply to many machines or to any machine. Also, there would be the problem of the compiler, which could vary from machine to machine. Moreover, it is often difficult to get reliable timing figures because of clock limitations and a multi-programming or time sharing environment. Finally, the difficulty of learning another machine language outweighs the advantage of finding "exact" fictitious times. All these considerations lead us to limit our goals for an a priori analysis. Instead, we will concentrate on developing only the frequency count for all statements. The anomalies of machine configuration and language will be lumped together when we do our experimental studies. Parallelism will not be considered.
Consider the three examples of Figure 1.4 below.
   .                                       for i ß 1 to n do
   .            for i ß 1 to n do
   .                                         for j ß 1 to n do
x ß x + l          x ß x + 1
   .                                             x ß x + 1
   .           end
   .                                         end
                                           end
(a)                    (b)                          (c)

Figure 1.4: Three simple programs for frequency counting.


In program (a) we assume that the statement x ßx + 1 is not contained within any loop either explicit or implicit. Then its frequency count is one. In program (b) the same statement will be executed n times and in program (c) n2 times (assuming n ß1). Now 1, n, and n2 are said to be different and increasing orders of magnitude just like 1, 10, 100 would be if we let n = 10. In our analysis of execution we will be concerned chiefly with determining the order of magnitude of an algorithm. This means determining those statements which may have the greatest frequency count.
To determine the order of magnitude, formulas such as
often occur. In the program segment of figure 1.4(c) the statement x x + 1 is executed
Simple forms for the above three formulas are well known, namely,
In general
k>=0

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...

Fn = Fn-1 + Fn-2, n 2.
The program on the following page takes any non-negative integer n and prints the value Fn.
1     procedure FIBONACCI
2       read (n)
3-4     if n < 0 then [print ('error'); stop]
5-6     if n = 0 then [print ('0'); stop]
7-8     if n = 1 then [print ('1'); stop]
9       fnm2  0; fnm 1  1
10       for i  2 to n do
11          fn  fnm1 + fnm2
12          fnm2  fnm1
13          fnm1  fn
14       end
15       print (fn)
16     end FIBONACCI
The first problem in beginning an analysis is to determine some reasonable values of n. A complete set would include four cases: n < 0, n = 0, n = 1 and n > 1. Below is a table which summarizes the frequency counts for the first three cases.
Step   n < 0  n = 0  n = 1
--------------------------
  1      1      1      1
  2      1      1      1
  3      1      1      1
  4      1      0      0
  5      0      1      1
  6      0      1      0
  7      0      0      1
  8      0      0      1
 9-15    0      0      0
These three cases are not very interesting. None of them exercises the program very much. Notice, though, how each if statement has two parts: the if condition and the then clause. These may have different execution counts. The most interesting case for analysis comes when n > 1. At this point the for loop will actually be entered. Steps 1, 2, 3, 5, 7 and 9 will be executed once, but steps 4, 6 and 8 not at all. Both commands in step 9 are executed once. Now, for n 2 how often is step 10 executed: not n - 1 but n times. Though 2 to n is only n - 1 executions, remember that there will be a last return to step 10 where i is incremented to n + 1, the test i > n made and the branch taken to step 15. Thus, steps 11, 12, 13 and 14 will be executed n - 1 times but step 10 will be done n times. We can summarize all of this with a table.
Step  Frequency  Step  Frequency
--------------------------------
 1        1        9       2
 2        1       10       n
 3        1       11      n-1
 4        0       12      n-1
 5        1       13      n-1
 6        0       14      n-1
 7        1       15       1
 8        0       16       1

Each statement is counted once, so step 9 has 2 statements and is executed once for a total of 2. Clearly, the actual time taken by each statement will vary. The for statement is really a combination of several statements, but we will count it as one. The total count then is 5n + 5. We will often write this as O(n), ignoring the two constants 5. This notation means that the order of magnitude is proportional to n.
The notation f(n) = O(g(n)) (read as f of n equals big-oh of g of n) has a precise mathematical definition.
Definition: f(n) = O(g(n)) iff there exist two constants c and no such that |f(n)| c|g(n)| for all n no.
f(n) will normally represent the computing time of some algorithm. When we say that the computing time of an algorithm is O(g(n)) we mean that its execution takes no more than a constant times g(n). n is a parameter which characterizes the inputs and/or outputs. For example n might be the number of inputs or the number of outputs or their sum or the magnitude of one of them. For the Fibonacci program n represents the magnitude of the input and the time for this program is written as T(FIBONACCI) = O(n).
We write O(1) to mean a computing time which is a constant. O(n) is called linear, O(n2) is called quadratic, O(n3) is called cubic, and O(2n) is called exponential. If an algorithm takes time O(log n) it is faster, for sufficiently large n, than if it had taken O(n). Similarly, O(n log n) is better than O(n2) but not as good as O(n). These seven computing times, O(1), O(log n), O(n), O(n log n), O(n2), O(n3), and O(2n) are the ones we will see most often throughout the book.
If we have two algorithms which perform the same task, and the first has a computing time which is O(n) and the second O(n2), then we will usually take the first as superior. The reason for this is that as n increases the time for the second algorithm will get far worse than the time for the first. For example, if the constant for algorithms one and two are 10 and 1/2 respectively, then we get the following table of computing times:
 n  10n    n2/2
-----------------
 1   10     1/2
 5   50   12-1/2
10  100     50
15  150  112-1/2
20  200    200
25  250  312-1/2
30  300    450
For n 20, algorithm two had a smaller computing time but once past that point algorithm one became better. This shows why we choose the algorithm with the smaller order of magnitude, but we emphasize that this is not the whole story. For small data sets, the respective constants must be carefully determined. In practice these constants depend on many factors, such as the language and the machine one is using. Thus, we will usually postpone the establishment of the constant until after the program has been written. Then a performance profile can be gathered using real time calculation.
Figures 1.6 and 1.7 show how the computing times (counts) grow with a constant equal to one. Notice how the times O(n) and O(n log n) grow much more slowly than the others. For large data sets, algorithms with a complexity greater than O(n log n) are often impractical. An algorithm which is exponential will work only for very small inputs. For exponential algorithms, even if we improve the constant, say by 1/2 or 1/3, we will not improve the amount of data we can handle by very much.
Given an algorithm, we analyze the frequency count of each statement and total the sum. This may give a polynomial
P(n) = cknk + ck-1 nk-1 + ... + c1n + co
where the ci are constants, ck !=0 and n is a parameter. Using big-oh notation, P(n) = O(nk). On the other hand, if any step is executed 2n times or more the expression
c2n + P(n) = O(2n).
Another valid performance measure of an algorithm is the space it requires. Often one can trade space for time, getting a faster algorithm but using more space. We will see cases of this in subsequent chapters. (for figure refer book)
log2n  n   nlog2n   n2     n3     2n
--------------------------------------------------------
  0    1    0      1     1      2
  1    2    2      4     8      4
  2    4    8      16    64     16
  3    8    24     64    512    256
  4    16   64     256   4096   65536
  5    32   160    1024  32768  2, 147, 483, 648
 
 

 ARRAYS
  • An array is a fixed size sequence collection of elements of the same data type.
  • It is simply a grouping of like type data .
  • In its simplest form an array can be used to represent a list  of numbers or a list of names.
Arrays are most frequently used in programming. Mathematical problems like matrix,
algebra and etc can be easily handled by arrays. An array is a collection of homogeneous
data elements described by a single name.it has consecutive set of memory location. Each element of an array is referenced by
a subscripted variable or value, called subscript or index enclosed in parenthesis. If an
element of an array is referenced by single subscript, then the array is known as one
dimensional array or linear array and if two subscripts are required to reference an element,
the array is known as two dimensional array and so on. Analogously the arrays
whose elements are referenced by two or more subscripts are called multi dimensional
arrays.

ONE DIMENSIONAL ARRAY
·         A list of items can be given one variable name using only one subscript and such a variable is called single subscipted variable or a one dimensional array.
·         For eg :  if we want to represent a set of five numbers say (3,4,5,6,7) by an array variable number then we  may declare the variable number as follows.
int no.[5];
·         The values to the array elements can be assigned as follows:

No.[0] =35;
No.[1] = 40;
No.[2] =20;
No .[3]= 51;
No .[4]=19;

Eg:
        Float height [50];
Height[0] refers to the first element of array
                           

DECLARATION OF ONE DIMENSIONAL ARRAY:

Datatype variable  -name [size];

Eg:  int group [10];

INITIALIZATION OF ONE DIMENTIONAL ARRAY:

·         After an array is declared ,its elements must be initialized.
·         Otherwise ,they will contain “garbage”
·         An array can be initialized at either at the following stages

1.      at compile time
2.      at run time

·         COMPILE TIME INITIALIZATION:
·         The general form of initialization of array is :

Type  array  - name [size]  ={  list of values}

·  The values in the list are repeated by commas. For eg : the statements,

int number [3]={0,0,0}
Will declare the variable number as an array of size 3 and will assign zero to each element.

RUN TIME INITIALIZATION:

·  An array can be explicitly initialized at runtime.
·  This approach is usually applied for initializing large arrays.
·  For eg consider the following segments of C program.

                           ……………..
……………………………………
                          For(i=0;i<100;i=i+1)
                          {
                             If i<50
                              Sum[i]=0.0;
                              Else
                                Sum[i]=1.0
                             }…………..
                               ………….
·         The first 50 elements of the array are initialized to 0 while  the remaining 50 elements  are  initialized  to 1.0  at runtime

Example Program

#include <stdio.h>
main()
{
    int a[5];  \\A
    for(int i = 0;i<5;i++)
    {
       a[i]=i;\\B
    }
    printarr(a);
}
void printarr(int a[])
{
    for(int i = 0;i<5;i++)
    {
        printf("value in array %d\n",a[i]);
    }
}

Explanation

1.      Statement A defines an array of integers. The array is of the size 5—that means you can store 5 integers.
2.      Array elements are referred to using subscript; the lowest subscript is always 0 and the highest subscript is (size –1). If you refer to an array element by using an out-of-range subscript, you will get an error. You can refer to any element as a[0], a[1], a[2], etc.
3.      Generally, you can use a for loop for processing an array. For the array, consecutive memory locations are allocated and the size of each element is same.
4.      The array name, for example, a, is a pointer constant, and you can pass the array name to the function and manipulate array elements in the function. An array is always processed element by element.
5.      When defining the array, the size should be known.

Points to Remember

1.      An array is a composite data structure in which you can store multiple values. Array elements are accessed using subscript.
2.      The subscript operator has the highest precedence. Thus if you write a[2]++,it increments the value at location 2 in the array.
3.      The valid range of subscript is 0 to size −1.
TWO DIMENSIONAL ARRAY:
·         In maths ,we represent a particular value m by a matrix by using two subscript such as Vij.

V[4][3]

SYNTAX:

Type array – name [ row-size] [column-size]

INITIALIZING TWO DIMENSIONAL ARRAY:

Like the one dimensional array,two dimensional array may be initialized by following their declaration with a list of initial values enclosed in braces.

For eg:

   int table [2] [3] ={ 0,0,0,1,1,1}

MULTIDIMENSIONAL ARRAY:
The general form of multidimensional array is

Type array- name[s1][s2][s3]……[sn]
Where s1 is the size

Eg:
int survey [3][5][12];
If we are reading or writing two-dimensional array, two loops are required. Similarly
the array of ‘n’ dimensions would require ‘n’ loops. The structure of the two dimensional
array is illustrated in the following figure :
int A[10][10];
A00
A01






A08
A09
A10
A11








A20



























































A80
A81








A90
A91







A99


searching techniques:
Ø  searching is an operation to find the location of the given data in the array, linked list.
Ø  Searching is said to be successful if the data is present, otherwise it is said to be unsuccessful.
Ø  There are 2 types of searching
1.       Internal searching
2.       External searching
1.       internal searching:
If all the data to be searched are in the main memory, then it is called as internal searching.
2.       external searching:
If the data to be searched are in the main memory, secondary memory, then it is called as external searching.
«  internal searching can be categorized as
i)        Linear search
ii)      Binary search

i)        linear search:
Ø  linear searching is otherwise called as sequential searching.
Ø  Let ‘A’ be the array of ‘n’ nos.
Ø  X is an element to be searched.
Ø  Our aim is to find whether X is present or not.
implementation:
1.       Compare X with A[0], if it equals, then return the position of A[0], else compare X with A[1], then return the position of A[1]….. and so on.
2.       We can repeat the process until a[n-1] times.
3.       If no matching occurs, then print data is not present.
ALGORITHM:
LINEAR(A,N,X)
1.       REPEAT FOR I=0,1,2……..N-1
2.       IF A[I]==X THEN
PRINT “DATA FOUND”
RETURN
3.       END OF STEP 1 FOR LOOP
4.       PRINT “DATA NOT FOUND”


ii)      BINARY SEARCH:
Ø  Binary search can be implemented in data which are in sorted order.
Implementation:
1.       Let ‘A’ be the array of ‘n’ nos.
2.       ‘X’ is the element to be searched.
3.       Initialize LOW=0 and HIGH=n-1
4.       Find the position of middle value =
5.       If ‘X’ < A[MID], then scan for the sub-array 0 to (MID-1)
6.       If ‘X’ > A[MID], then scan for the sub-array (MID+1) to (N-1)
7.       Else return MID value.
8.       If no matching occurs, return 0.


Algorithm:
Binary(a,n,x)


1.       [initialize]
low←0
high←n-1
2.       [search the elements]
Repeat through step 4 while low≤high
3.       [find mid]
Mid =
4.       [compare]
If x<a[mid] then
high←mid-1
else If x>a[mid] then
low←mid+1
else
write ”data found”
return mid
5.       [unsuccessful search]
Write “data not found”

eg.,
a[0]       a[1]       a[2]       a[3]       a[4]       a[5]
10           14           16           20           25           30

low=0
high=5

0≤5 (condition true)
=2
mid=2

25<16(condition false)
25>16(condition true)
low=3
high=5
3≤5
=4
mid=2

25<25(condition false)
25>25(condition false)
25=25(condition true)
“data found”

FIBONACCI SEARCH

The Fibonacci search technique is a method of searching a sorted array using a divide and conquer algorithm that narrows down possible locations with the aid of Fibonacci numbers. This method is faster than traditional binary search algorithms, with a complexity of O(log(x)).

Algorithm
Let k be defined as an element in F, the array of Fibonacci numbers. n = Fm is the array size. If the array size is not a Fibonacci number, let Fm be the smallest number in F that is greater than n.
The array of Fibonacci numbers is defined where Fk+2 = Fk+1 + Fk, when k ≥ 0, F1 = 1, and F0 = 0.

To test whether an item is in the list of ordered numbers, follow these steps:
  1. Set k = m.
  2. If k = 0, stop. There is no match; the item is not in the array.
  3. Compare the item against element in Fk-1.
  4. If the item matches, stop.
  5. If the item is less than entry Fk-1, discard the elements from positions Fk-1 + 1 to n. Set k = k - 1 and return to step 2.
  6. If the item is greater than entry Fk-1, discard the elements from positions 1 to Fk-1. Renumber the remaining elements from 1 to Fk-2, set k = k - 2, and return to step 2.

//PROGRAM TO IMPLEMENT THE FIBONACCI SEARCH
//CODED AND COMPILED IN TURBO C
#include<iostream.h>
#include<conio.h>
//This function will find the fibonacci number
int fib(int n)
{
int f1,f2,temp;
f1=0;f2=1;
for (int i=0;i<n;i++)
{
temp=f2;
f2=f1+f2;
f1=temp;
}
return(f2);
}
//Function to search an item using fibonacci numbers
int fibonacci_search(int list[],int n,int item)
{
int f1,f2,t,mid;
for (int j=1;fib( j)<n;j++);
f1=fib( j–2); //find lower fibonacci numbers
f2=fib( j–3); //f1=fib( j–2), f2=fib( j–3)
mid=n–f1+1;
while (item != list[mid]) //if not found
if (mid<0 || item > list[mid])
{ //look in lower half
if (f1==1)
return(–1);
mid=mid+f2;//decrease fibonacci numbers
f1 = f1–f2;
f2 = f2–f1;
}
else
{ //look in upper half
if (f2==0) //if not found return –1
return(–1);
mid=mid–f2; //decrease fibonacci numbers
t=f1–f2; //this time, decrease more
f1=f2; //for smaller list
f2=t;
}
return(mid);
}
void main()
{
int loc,n,item,list[50];
cout<<“\n\nEnter the total number of list : ”;
cin>>n;
cout<<“\nEnter the elements in the list:”;
for (int i=0;i<n;i++)
{
cout<<“\nInput “<<i<<” th number : ”;
cin>>list[i];
}
cout<<“\nEnter the number to be searched :”;
cin>>item;
loc=fibonacci_search(list,n,item);
if (loc != –1)
cout<<“\nThe number is in the list”;
else
cout<<“\nThe number is not in the list”;
getch();
}
WORST CASE PERFORMANCE
Beginning with an array containing Fj – 1 elements, the length of the subset is bounded
to Fj-1-1 elements. To search through a range with a length of Fn-1 at the beginning we
have to make n comparisons in the worst case. Fn = (1/sqrt(5))*((1+sqrt(5))/2)n, that’s
approximately c*1,618n (with a constant c). For N+1 = c*1,618n elements we need n comparisons,
i.e., the maximum number of comparisons is O (ld N).