General
persistent
Retained after execution of the program which created it.
- When we build file structures, we are making it possible to make data persistent. That is, one program can store data from memory to a file, and terminate. Later, another program can retrieve the data from the file, and process it in memory.
- In this chapter, we look at file structures which can be used to organize the data within the file, and at the algorithms which can be used to store and retrieve the data sequentially.
Field and Record Organization
Data Representation in Memory
record
field
stream of bytes
- A file which is regarded as being without structure beyond separation
- into a sequential set of bytes.
-
- Within a program, data is temporarily stored in variables.
- Individual values can be aggregated into structures, which can be treated as a single variable with parts.
- In C++, classes are typically used as as an aggregate structure.
- C++ Person class (version 0.1):
class Person { public: char FirstName [11]; char LastName[11]; char Address [21]; char City [21]; char State [3]; char ZIP [5]; };
- With this class declaration, variables can be declared to be of type Person. The individual fields within a Person can be referred to as the name of the variable and the name of the field, separated by a period (.).
- C++ Program:
#include class Person { public: char FirstName [11]; char LastName[11]; char Address [31]; char City [21]; char State [3]; char ZIP [5]; }; void Display (Person); int main () { Person Clerk; Person Customer; strcpy (Clerk.FirstName, "Fred"); strcpy (Clerk.LastName, "Flintstone"); strcpy (Clerk.Address, "4444 Granite Place"); strcpy (Clerk.City, "Rockville"); strcpy (Clerk.State, "MD"); strcpy (Clerk.ZIP, "00001"); strcpy (Customer.FirstName, "Lily"); strcpy (Customer.LastName, "Munster"); strcpy (Customer.Address, "1313 Mockingbird Lane"); strcpy (Customer.City, "Hollywood"); strcpy (Customer.State, "CA"); strcpy (Customer.ZIP, "90210"); Display (Clerk); Display (Customer); } void Display (Person Someone) { cout << Someone.FirstName << Someone.LastName << Someone.Address << Someone.City << Someone.State << Someone.ZIP; }
- In memory, each Person will appear as an aggregate,
with the individual values being parts of the
aggregate:
Person Clerk FirstName LastName Address City State ZIP Fred Flintstone 4444 Granite Place Rockville MD 0001 - The output of this program will be:
FredFlintstone4444 Granite PlaceRockvilleMD00001LilyMunster1313 Mockingbird LaneHollywoodCA90210
- Obviously, this output could be improved. It is marginally readable by people, and it would be difficult to program a computer to read and correctly interpret this output.
Delineation of Records in a File
Fixed Length Records
A record which is predetermined to be the same length as the other records in the file.
Record 1 Record 2 Record 3 Record 4 Record 5 - The file is divided into records of equal size.
- All records within a file have the same size.
- Different files can have different length records.
- Programs which access the file must know the record length.
- Offset, or position, of the nth record of a file can be calculated.
- There is no external overhead for record separation.
- There may be internal fragmentation (unused space within records.)
- There will be no external fragmentation (unused space outside of records) except for deleted records.
- Individual records can always be updated in place.
- Algorithms for Accessing Fixed Length Records
- Code for Accessing Fixed Length Records
- Code for Accessing String Records
- Example (80 byte records):
0 66 69 72 73 74 20 6C 69 6E 65 0 0 1 0 0 0 first line...... 10 0 0 0 0 0 0 0 0 FF FF FF FF 0 0 0 0 ................ 20 68 FB 12 0 DC E0 40 0 3C BA 42 0 78 FB 12 0 h.....@.<.B.x... 30 CD E3 40 0 3C BA 42 0 8 BB 42 0 E4 FB 12 0 ..@.<.B...B..... 40 3C 18 41 0 C4 FB 12 0 2 0 0 0 FC 3A 7C 0 <.A..........:|. 50 73 65 63 6F 6E 64 20 6C 69 6E 65 0 1 0 0 0 second line..... 60 0 0 0 0 0 0 0 0 FF FF FF FF 0 0 0 0 ................ 70 68 FB 12 0 DC E0 40 0 3C BA 42 0 78 FB 12 0 h.....@.<.B.x... 80 CD E3 40 0 3C BA 42 0 8 BB 42 0 E4 FB 12 0 ..@.<.B...B..... 90 3C 18 41 0 C4 FB 12 0 2 0 0 0 FC 3A 7C 0 <.A..........:|.
- Advantage: the offset of each record can be calculated from its record number. This makes direct access possible.
- Advantage: there is no space overhead.
- Disadvantage: there will probably be internal fragmentation (unusable space within records.)
-
Delimited Variable Length Records
-
- variable length record
- delimited record
- delimiter
- A special character or group of characters stored after a field or record, which indicates the end of the preceding unit.
Record 1 # Record 2 # Record 3 # Record 4 # Record 5 # - The records within a file are followed by a delimiting byte or series of bytes.
- The delimiter cannot occur within the records.
- Records within a file can have different sizes.
- Different files can have different length records.
- Programs which access the file must know the delimiter.
- Offset, or position, of the nth record of a file cannot be calculated.
- There is external overhead for record separation equal to the size of the delimiter per record.
- There should be no internal fragmentation (unused space within records.)
- There may be no external fragmentation (unused space outside of records) after file updating.
- Individual records cannot always be updated in place.
- Algorithms for Accessing Delimited Variable Length Records
- Code for Accessing Delimited Variable Length Records
- Code for Accessing Variable Length Line Records
- Example (Delimiter = ASCII 30 (IE) = RS character:
0 66 69 72 73 74 20 6C 69 6E 65 1E 73 65 63 6F 6E first line.secon 10 64 20 6C 69 6E 65 1E d line.
- Example (Delimiter = '\n'):
0 46 69 72 73 74 20 28 31 73 74 29 20 4C 69 6E 65 First (1st) Line 10 D A 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 6C ..Second (2nd) l 20 69 6E 65 D A ine..
- Disadvantage: the offset of each record cannot be calculated from its record number. This makes direct access impossible.
- Advantage: there is space overhead for the length prefix.
- Advantage: there will probably be no internal fragmentation (unusable space within records.)
Delineation of Fields in a Record
Fixed Length Fields
Field 1 Field 2 Field 3 Field 4 Field 5 - Each record is divided into fields of correspondingly equal size.
- Different fields within a record have different sizes.
- Different records can have different length fields.
- Programs which access the record must know the field lengths.
- There is no external overhead for field separation.
- There may be internal fragmentation (unused space within fields.)
Delimited Variable Length Fields
Field 1 ! Field 2 ! Field 3 ! Field 4 ! Field 5 ! - The fields within a record are followed by a delimiting byte or series of bytes.
- Fields within a record can have different sizes.
- Different records can have different length fields.
- Programs which access the record must know the delimiter.
- The delimiter cannot occur within the data.
- If used with delimited records, the field delimiter must be different from the record delimiter.
- There is external overhead for field separation equal to the size of the delimiter per field.
- There should be no internal fragmentation (unused
Length Prefixed Variable Length Fields
12 Field 1 4 Field 2 10 Field 3 8 Field 4 7 Field 5 - The fields within a record are prefixed by a length byte or bytes.
- Fields within a record can have different sizes.
- Different records can have different length fields.
- Programs which access the record must know the size and format of the length prefix.
- There is external overhead for field separation equal to the size of the length prefix per field.
- There should be no internal fragmentation (unused space within fields.)
No comments:
Post a Comment