Data Structures
Data structures are specialized formats used to organize and store data on computers, enabling efficient retrieval and processing. Understanding data structures is vital in the context of information systems, where the ability to quickly access vast amounts of data can significantly impact operational efficiency. Common types of data structures include arrays, records, lists, queues, stacks, and trees, each serving unique purposes based on the nature of the data and the requirements of the application.
The effectiveness of a data structure can greatly influence problem-solving capabilities, determining whether results are obtained in minutes instead of days. Additionally, the choice of data structure involves a careful consideration of associated costs and benefits, including storage space, processing time, and programming complexity. Each structure provides distinct advantages and limitations; for instance, stacks operate on a last-in, first-out basis, while queues function on a first-in, first-out principle.
Ultimately, selecting the appropriate data structure is essential for optimizing performance in various applications, from personal tasks to complex enterprise-level databases. This choice not only affects the immediate efficiency of operations but also has long-term implications for system management and resource utilization.
On this Page
- Business Information Systems > Data Structures
- Overview
- Data Retrieval
- Importance of Organization
- Types of Data Structures
- Applications
- Problem Solving & Processing Operations
- Data-Centered Approach to Problem Solving
- Cost & Benefit Analyses of Data Structures
- Example: The Banking Industry
- Example: Information Databases
- Conclusion
- Terms & Concepts
- Bibliography
- Suggested Reading
Subject Terms
Data Structures
Arguably, it is the storage and retrieval of information and the databases that house it that give information systems their power. Just knowing where the data are located, however, is typically insufficient for easy retrieval and processing. Data structures are specialized formats that are used to organize and store data on a computer. Data structures are designed to organize data to best suit a specific purpose so that data can be easily accessed and processed appropriately. Particularly for complex problem solving and processing operations, the choice of data structure may mean the difference between getting one's answer back in the matter of minutes or in the matter of days. Solutions need to be both efficient and cost effective in order to be practical. Each data structure has associated costs and benefits that must be considered during the selection process.
Keywords Array; Data Item; Data Structure; Database; Database Management System (DBMS); Enterprise; Entity; Information System; Information Technology; List; Processing; Query; Record; Set; Tree
Business Information Systems > Data Structures
Overview
Innovations in computer hardware in general and storage in particular seem to proliferate at an unbelievable rate. In the not too distant past, data were keypunched on computer cards or recorded on large magnetic tapes. These approaches to data storage gave way to 5.25-inch floppy disks, which in turn gave way to 3.5-inch disks. Many modern computers do not even have the option for inserting a disk as a portable memory device: Disks have been replaced by memory sticks and have become as antiquated as the punch cards before them. Memory inside the computer has changed, too. When 5.25-inch floppy disks were a high-tech solution, a 200 MB hard drive was considered an impressive amount of computing power for one's desktop computer. We now measure hard drive sizes in gigabytes and terabytes rather than in megabytes. This trend shows no sign of abating in the foreseeable future.
Data Retrieval
However, a large storage capacity is of no use if there is no way to retrieve the data that are stored on it. Particularly in twenty-first-century businesses with their increasing emphasis on information and knowledge, it is important that massive amounts of data not only be stored, but retrieved and processed quickly and accurately. Data structures are specialized formats that are used to organize and store data on a computer. Data structures are designed to organize data to best suit a specific purpose so that they can be easily accessed and processed appropriately. Common data structures include arrays and sets, records, lists, queues, stacks, and trees.
Arguably, however, it is the storage and retrieval of information and the databases that house it that give information systems their power. Computers can be used to store and retrieve data for a wide variety of situations in business ranging from personal productivity to enterprise-wide applications that are used by multiple users and maintained by information technology (IT) professionals. The data in databases are managed by software programs called database management systems. These programs are designed to increase the accessibility of data and the productivity of the user.
Importance of Organization
To understand the need for data structure within a database, it is first helpful to examine the need for organizing everyday information. Whether data and information are stored in a computer or on random slips of paper, unless data are accessible, they cannot be retrieved and used. For example, most people keep some sort of personal address book or collect business cards with contact information they may need in the future. This collection of data may include contact information for friends and family, business contacts, and various contractors and service providers used both at home and at work. In addition to common contact information (e.g., name, address, telephone number, e-mail address), the user may want to include other information such as birthdays for friends and family and ratings and references for various contractors. This information can be stored in a number of different ways ranging from random slips of paper and business cards shoved in a drawer to various types of address books that format the information.
However, just knowing where the data are located (e.g., in the kitchen drawer, in the address book) is typically insufficient for easy retrieval. For example, when the roof leaks, even if Harry Homeowner knows that the roofer's information is in the kitchen junk drawer, he still may have to go through many pieces of paper with various contractors' contact information on them before he finds the information he needs. In addition, Harry may find that the kitchen drawer has information on several roofers, so he needs to remember which one was recommended and which one was not. This task would be made much easier if the data were organized in some way, such as in an address book that had a separate section for handymen or household contractors.
This organization is analogous to the data structure of the database. In IT terms, the process of finding the data is called retrieval. To retrieve data from a database, the user needs to query the system. A query is a search question used by a database management system to specify which data are to be retrieved from the database. So, for example, if Harry had a database of contractor information, he could query the system to retrieve all the records on roofing contractors. Unless there is only a handful of cards in Harry's kitchen drawer, the data structure provided by an organized address book will allow Harry to find the name of the contractor much more quickly than by manually searching through the kitchen drawer. If the information were housed in a database, Harry could search the database using a number of different queries, including the contractor's name or specialty (e.g., roofing or roof repair).
Types of Data Structures
There are a many different data structures available for the design of programs and databases. Among these are arrays and sets, records, lists, queues, stacks, and trees.
- Arrays comprise a group of indexed objects that are all the same size and type. Arrays are seen in many places in life besides computer programming. For example, train schedules and tax tables are both examples of real-world arrays. Records are a complete set of information about an entity.
- Sets are named collections of data. The difference between arrays and sets is analogous to the difference between a deck of cards in a neat pile (array) versus the same deck of cards scattered across the floor (set).
- Records are composite data and are often a mixture of elements that are alphabetic, numeric, and logical. Because the data contained in records is heterogeneous, they cannot be stored in an array. Common examples of a record include a checkbook register or employment data in a personnel file.
Lists, stacks, and queues are examples of other ways to structure data.
As with the lists we use in everyday life — lists of chores to do, errands to run, or groceries to buy — lists as a data structure are ordered sets of data. If a program only needs to store a few things (e.g., job descriptions, payroll records), lists often offer the simplest and most effective approach. In such situations or situations where data do not require a search capability or where the information does not need to be ordered, lists may be productively and efficiently used.
Stacks and queues are restricted forms of lists. Although lists have the generality that is needed for many situations, they are complicated to use and, as a result, more costly than other structures both in terms of time and storage space required. Stacks and queues offer compromises between lists and other data structures. These data structures are more flexible than arrays but still allow relatively simple implementations.
- A stack is a data structure in which items are removed in the reverse order from which they were entered, i.e., last in, first out (LIFO). This is analogous to the stack of plates in the kitchen cupboard. Although the plate on the top of the stack was the last one placed there, it is the first one removed from the stack.
- A queue is a data structure in which items are removed in the same order as they were entered, i.e., first in, first out (FIFO). Queues are familiar in everyday life, including at the checkout counter at the grocery store or at the gas pump. Every new arrival (whether it be a new customer in line or a new item to be processed in a database) goes to the end of the queue and is not processed until all the other items before it are processed.
The tree is a data structure in which each data element is attached to one or more elements directly beneath it by connections called branches. Noncomputer examples of trees that are used in everyday life are telephone-calling trees that are used as emergency communication systems in which one person on the list calls two others and each of those call two others until the entire membership of the list has been contacted. Another example of a tree is an organizational chart. For example, in situations where one has difficulty with a customer service representative, it may be necessary to go up one level in the organizational chart and talk to a supervisor in order to get a problem resolved. This can continue all the way up the chart through managers and executives until the chief executive officer is contacted.
Applications
Problem Solving & Processing Operations
As discussed above, the ever-increasing capacity for computers to store data has led to a situation where it is not less important to structure data appropriately but more so. Greater storage capacity and more powerful computers encourage programmers and designers to attempt to solve more complex problems with computers. Particularly for complex problem solving and processing operations, the choice of data structure may mean the difference between getting one's answer back in the matter of minutes or in the matter of days. Solutions need to be both efficient and cost effective in order to be practical. Efficiency means that the problem can be solved within the resource constraints of the system. Some of the resource constraints on data processing include the space available to store the data and the time necessary to perform each subtask in the process. A solution is efficient if it requires fewer resources than its alternatives. The cost of a solution refers to the amount of resources that it consumes (e.g., time). Ideal solutions minimize their use of resources and efficiently solve problems.
Data-Centered Approach to Problem Solving
The purpose of computer programs in general is to solve problems—whether that be a way to teach a child the alphabet or to search for extraterrestrial life. Just as the problems to be solved using computers differ, so do the data structures that are most efficient for solving those problems. Although some programmers merely apply the data structure with which they are most familiar, this can result in an inappropriate data structure and resultant loss of program efficiency. Therefore, when selecting a data structure to solve a problem, it is best to take a data-centered approach.
- First, the problem to be solved using the computer program needs to be analyzed to determine what resource constraints must be met by the solution. For example, if a program requires 20 MB of space in order to perform its operations but the computer only has 15 MB available for processing, the solution is not appropriate. There are typically resource constraints on various key operations such as searching, inserting, or deleting records. These drive the selection of an appropriate data structure.
- Second, the basic operations to be supported and their concomitant resources must be determined and quantified. These operations include such activities as inserting or deleting a data item within a data structure or finding a specific item in response to a query or other input. Only once these parameters have been determined can one find a data structure that best meets the requirements of the program.
Cost & Benefit Analyses of Data Structures
Each data structure has associated costs and benefits that must be considered during the selection process. For example, each data structure requires a specific amount of space for storage of each data item, a given amount of time for processing or performing other operations, and a certain amount of effort for programming. All these aspects must be taken into account in order to perform trade-offs between alternative data structures.
Example: The Banking Industry
Banks must support a number of customer transactions, including opening and closing accounts and adding or withdrawing money from accounts. Typically, customers access the account (e.g., to make deposits or withdrawals, transfer funds, view recent transactions) more frequently than they open or close accounts. One of the implications of this observation is that customers are typically willing to spend more time to open or close an account than they are willing to take to perform actions on a current account. Therefore, it is more important that data be quickly accessed and processed for common transactions than it is for infrequent transactions. To help meet these expectations, banks typically offer multiple levels of service — human tellers, automated teller machines (ATMs), or websites and mobile applications are commonly available for quickly processing common transactions while special service representatives available only during certain hours take care of less frequent activities such as opening or closing accounts.
From the point of view of database design and data structures, activities performed by human or automated tellers do not significantly modify the database (e.g., simplistically do little more than modify the number stored in the database). Adding a new account, on the other hand, involves considerably more steps and concomitantly more time. Deleting an existing account from the customer's perspective involves little more than withdrawing the balance from the account. Although closing the account involves more steps from the bank's perspective, these tasks do not have to be done immediately but can wait until after hours, until the end of the month, or another convenient time.
The data structure that best solves this problem needs to consider these constraints and can be inefficient for deletion but must be very efficient for search and moderately efficient for insertion. Records need to be accessed by account number. One data structure that meets these requirements is the hash table, which allows for extremely fast exact-match searches. This data structure can be quickly modified when the modification does not affect the space requirements. Hash tables also are very efficient for inserting new records. This data structure also can efficiently support deletions, but too many of this type of transaction can lead to performance degradations. However, periodic reorganization of the hash table can restore system efficiency and can occur offline without affecting other operations.
Example: Information Databases
Another example of the use of data structures might be in the development of a database that contains information about the cities and towns in the United States. The database might be used to locate information about a location by name (e.g., a query on "Chicago") or to find all the places that match a particular value or range of values (e.g., "all towns with populations between 10,000 and 20,000"). As with the example of common banking transactions above, this database needs to be structured so that the answers to these questions can be answered relatively quickly. To answer these types of questions, it would be better for the database management system to process the data on all the cities within the range as a batch rather than performing iterative operations on individual cities. The hash table mentioned above, however, would be inefficient for such processing because it cannot handle batch queries. A B+ tree, on the other hand, can not only support large databases such as one containing information on all the cities and towns in the United States, but can also handle the insertion and deletion of data records and perform queries on range data. On the other hand, if the data are entered once and not changed (for example, as in an atlas database), a simple linear index might be even more appropriate and efficient.
Conclusion
Data structures are specialized formats that are used to organize and store data on a computer. These structures are designed to organize data to best suit a specific purpose so that they can be easily accessed and processed appropriately. There are a many different data structures available for the design of programs and databases. Among these are arrays and sets, records, lists, queues, stacks, and trees. Particularly for complex problem solving and processing operations, the choice of data structure may mean the difference between getting one's answer back in the matter of minutes or in the matter of days. Therefore, solutions need to be both efficient and cost effective in order to be practical. Each data structure has associated costs and benefits that must be considered during the selection process.
Terms & Concepts
Array: A group of indexed objects that are all the same size and type. An array is analogous to a deck of cards in a neat pile.
Data Item: A specific detail of an individual entity in a database. For example, the data element "year in school" may contain the items freshman, sophomore, junior, or senior.
Data Structure: Any one of a number of specialized formats that are used to organize and store data on a computer. Data structures are designed to organize data to best suit a specific purpose so that they can be easily accessed and processed appropriately. Common data structures include arrays and sets, records, lists, queues, stacks, and trees.
Database: A collection of data items used for multiple purposes; stored on a computer.
Database Management System (DBMS): A software program that allows users to manage the data in a database. Database management systems are designed to increase the accessibility of data and the productivity of the user.
Enterprise: An organization that uses computers. Although this term is often applied to large organizations, the term can be applied to both small and large organizations.
Entity: A person, place, thing, event, or condition about which a database houses information. Entities in the database are described by various attributes.
Information System: A system that facilitates the flow of information and data between people or departments.
Information Technology (IT): The use of computers, communications networks, and knowledge in the creation, storage, and dispersal of data and information. Information technology comprises a wide range of items and abilities for use in the creation, storage, and distribution of information.
List: An ordered set of data. Stacks and queues are restricted forms of lists. A stack is a data structure in which items are removed in the reverse order from which they were entered. A queue is a data structure in which items are removed in the same order as they were entered.
Processing: The activity of converting, analyzing, computing, and synthesizing data or information stored in a computer so that it is in a useful form.
Query: A search question used by a database management system to specify what data are to be retrieved from the database.
Record: A complete set of information about an entity. Records are composed of fields.
Set: A named collection of data. A set is analogous to a deck of cards scattered across the floor.
Tree: A data structure in which each data element is attached to one or more elements directly beneath it by connections called branches.
Bibliography
Helland, P. (2011). If you have too much data, then 'good enough' is good enough. Communications of the ACM, 54, 40–47. Retrieved November 27, 2013 from EBSCO online database Business Source Premier. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=63231735
Mehta, M. B. (2012). The plan precedes the database. Industrial Engineer: IE, 44, 38–42. Retrieved November 27, 2013 from EBSCO online database Business Source Premier. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=80373102
Shaffer, C. A. (2001). A practical introduction to data structures and algorithm analysis (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Smaragdakis, Y. (2012). High-level data structures. Communications of the ACM, 55, 90. Retrieved November 27, 2013 from EBSCO online database Business Source Premier. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=84348474
Smith, H. F. (1987). Data structures: Form and function. San Diego, CA: Harcourt Brace Jovanovich.
Suggested Reading
Beichl, I., Bernal, J., Witzgall, C., & Sullivan, F. (2001). Computational geometry. In S. I. Gass & C. M. Harris (Eds.), Encyclopedia of Operations Research and Management Science (pp. 122–126). New York, NY: Wiley. Retrieved October 17, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=21891230&site=ehost-live
Bowen, P. L., O'Farrell, R. A., & Rohde, F. H. (2006). Analysis of competing data structures: Does ontological clarity produce better end user query performance. Journal of the Association for Information Systems, 7, 514–544. Retrieved October 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=22571347&site=ehost-live
Lalonde, J.-F., Vandapel, N., & Hebert, M. (2007). Data structures for efficient dynamic processing in 3-D. International Journal of Robotics Research, 26, 777–796. Retrieved October 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=26111110&site=ehost-live
Meyers, J. L., & Beretvas, S. N. (2006). The impact of inappropriate modeling of cross-classified data structures. Multivariate Behavioral Research, 41, 473–497. Retrieved October 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=23579714&site=ehost-live
Shavit, N. (2011). Data structures in the multicore age. Communications of the ACM, 54, 76-84. Retrieved November 27, 2013 from EBSCO online database Business Source Premier. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=59423981
Standish, T. A. (1980). Data structure techniques. Reading, MA: Addison-Wesley Publishing Company.
Tupper, C. (2011). Data architecture: From Zen to reality. Amsterdam, Netherlands: Morgan Kaufmann. Retrieved November 27, 2013 from EBSCO online database eBook Collection (EBSCOhost). http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=363704&site=ehost-live