General
Retained after execution of the program which created it.
- When we build file structures, we are making it
possible to make data persistent.
That is, one program can store data from memory
to a file, and terminate.
Later, another program can retrieve the data from
the file, and process it in memory.
- In this chapter, we look at file structures which
can be used to organize the data within the file, and
at the algorithms which can be used to store and
retrieve the data sequentially.
Field and Record Organization
Data Representation in Memory
- A subdivision of a file, containing data related to a single entity.
- A subdivision of a record containing a single attribute of the entity which the record describes.
- A file which is regarded as being without structure beyond separation
- into a sequential set of bytes.
-
-
-
Delineation of Records in a File
Fixed Length Records
A record which is predetermined to be the same length as the other records in the file.
-
-
Delimited Variable Length Records
-
- A record which can differ in length from the other records of the file.
- A variable length record which is terminated by a special character or sequence of characters.
- A special character or group of characters stored after a field or record, which indicates the end of the preceding unit.
Record 1 |
# |
Record 2 |
# |
Record 3 |
# |
Record 4 |
# |
Record 5 |
# |
- The records within a file are followed by a delimiting byte or series of bytes.
- The delimiter cannot occur within the records.
- Records within a file can have different sizes.
- Different files can have different length records.
- Programs which access the file must know the delimiter.
- Offset, or position, of the nth record of a file cannot be calculated.
- There is external overhead for record separation equal to the size of the delimiter per record.
- There should be no internal fragmentation (unused space within records.)
- There may be no external fragmentation (unused space outside of records) after file updating.
- Individual records cannot always be updated in place.
- Algorithms for
Accessing Delimited Variable Length Records
-
Code for Accessing Delimited Variable Length Records
-
Code for Accessing Variable Length Line Records
- Example (Delimiter = ASCII 30 (IE) = RS character:
0 66 69 72 73 74 20 6C 69 6E 65 1E 73 65 63 6F 6E first line.secon
10 64 20 6C 69 6E 65 1E d line.
- Example (Delimiter = '\n'):
0 46 69 72 73 74 20 28 31 73 74 29 20 4C 69 6E 65 First (1st) Line
10 D A 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 6C ..Second (2nd) l
20 69 6E 65 D A ine..
- Disadvantage: the offset of each record cannot
be calculated from its record number.
This makes direct access impossible.
- Advantage: there is space overhead for the
length prefix.
- Advantage: there will probably be no
internal fragmentation (unusable space
within records.)
Delineation of Fields in a Record
Fixed Length Fields
Field 1 |
Field 2 |
Field 3 |
Field 4 |
Field 5 |
- Each record is divided into fields of correspondingly equal size.
- Different fields within a record have different sizes.
- Different records can have different length fields.
- Programs which access the record must know the field lengths.
- There is no external overhead for field separation.
- There may be internal fragmentation (unused space within fields.)
Delimited Variable Length Fields
Field 1 |
! |
Field 2 |
! |
Field 3 |
! |
Field 4 |
! |
Field 5 |
! |
- The fields within a record are followed by a delimiting byte or series of bytes.
- Fields within a record can have different sizes.
- Different records can have different length fields.
- Programs which access the record must know the delimiter.
- The delimiter cannot occur within the data.
- If used with delimited records, the field
delimiter must be different from the record
delimiter.
- There is external overhead for field separation equal to the size of the delimiter per field.
- There should be no internal fragmentation (unused
Length Prefixed Variable Length Fields
12 |
Field 1 |
4 |
Field 2 |
10 |
Field 3 |
8 |
Field 4 |
7 |
Field 5 |
- The fields within a record are prefixed by a length byte or bytes.
- Fields within a record can have different sizes.
- Different records can have different length fields.
- Programs which access the record must know the size and format of the length prefix.
- There is external overhead for field separation equal to the size of the length prefix per field.
- There should be no internal fragmentation (unused space within fields.)