Wednesday, 28 November 2012

FUNDAMENTAL FILE STRUCTURE CONCEPTS


General

persistent

Retained after execution of the program which created it.


  • When we build file structures, we are making it possible to make data persistent. That is, one program can store data from memory to a file, and terminate. Later, another program can retrieve the data from the file, and process it in memory.
  • In this chapter, we look at file structures which can be used to organize the data within the file, and at the algorithms which can be used to store and retrieve the data sequentially.                                


Field and Record Organization

 

Data Representation in Memory

record

A subdivision of a file, containing data related to a single entity.

field

A subdivision of a record containing a single attribute of the entity which the record describes.

stream of bytes

A file which is regarded as being without structure beyond separation 
into a sequential set of bytes. 
 
 
 
  • Within a program, data is temporarily stored in variables.
  • Individual values can be aggregated into structures, which can be treated as a single variable with parts.
  • In C++, classes are typically used as as an aggregate structure.
  • C++ Person class (version 0.1):
    class Person {
      public:
        char FirstName [11];
        char LastName[11];
        char Address [21];
        char City [21];
        char State [3];
        char ZIP [5];
    };
    
  • With this class declaration, variables can be declared to be of type Person.  The individual fields within a Person can be referred to as the name of the variable and the name of the field, separated by a period (.).
  • C++ Program:
    #include 
    
    class Person {
      public:
        char FirstName [11];
        char LastName[11];
        char Address [31];
        char City [21];
        char State [3];
        char ZIP [5];
    };
    
    void Display (Person);
    
    int main () {
      Person Clerk;
      Person Customer;
      
      strcpy (Clerk.FirstName, "Fred");
      strcpy (Clerk.LastName, "Flintstone");
      strcpy (Clerk.Address, "4444 Granite Place");
      strcpy (Clerk.City, "Rockville");
      strcpy (Clerk.State, "MD");
      strcpy (Clerk.ZIP, "00001");
      
      strcpy (Customer.FirstName, "Lily");
      strcpy (Customer.LastName, "Munster");
      strcpy (Customer.Address, "1313 Mockingbird Lane");
      strcpy (Customer.City, "Hollywood");
      strcpy (Customer.State, "CA");
      strcpy (Customer.ZIP, "90210");
      
      Display (Clerk);
      Display (Customer);
    }
    
    void Display (Person Someone) {
      cout << Someone.FirstName << Someone.LastName
           << Someone.Address << Someone.City
           << Someone.State << Someone.ZIP;
    }
    
  • In memory, each Person will appear as an aggregate, with the individual values being parts of the aggregate:
    Person
    Clerk
    FirstNameLastName AddressCity StateZIP
    FredFlintstone 4444 Granite PlaceRockville MD0001
  • The output of this program will be:
    FredFlintstone4444 Granite PlaceRockvilleMD00001LilyMunster1313 Mockingbird LaneHollywoodCA90210
  • Obviously, this output could be improved.  It is marginally readable by people, and it would be difficult to program a computer to read and correctly interpret this output. 


Delineation of Records in a File

 

Fixed Length Records

A record which is predetermined to be the same length as the other records in the file.

 
  • Record 1 Record 2 Record 3 Record 4 Record 5
  • The file is divided into records of equal size.
  • All records within a file have the same size.
  • Different files can have different length records.
  • Programs which access the file must know the record length.
  • Offset, or position, of the nth record of a file can be calculated.
  • There is no external overhead for record separation.
  • There may be internal fragmentation (unused space within records.)
  • There will be no external fragmentation (unused space outside of records) except for deleted records.
  • Individual records can always be updated in place.
  • Algorithms for Accessing Fixed Length Records
  • Code for Accessing Fixed Length Records
  • Code for Accessing String Records
  • Example (80 byte records):
       0  66 69 72 73 74 20 6C 69 6E 65  0  0  1  0  0  0  first line......
      10   0  0  0  0  0  0  0  0 FF FF FF FF  0  0  0  0  ................
      20  68 FB 12  0 DC E0 40  0 3C BA 42  0 78 FB 12  0  h.....@.<.B.x...
      30  CD E3 40  0 3C BA 42  0  8 BB 42  0 E4 FB 12  0  ..@.<.B...B.....
      40  3C 18 41  0 C4 FB 12  0  2  0  0  0 FC 3A 7C  0  <.A..........:|.
      50  73 65 63 6F 6E 64 20 6C 69 6E 65  0  1  0  0  0  second line.....
      60   0  0  0  0  0  0  0  0 FF FF FF FF  0  0  0  0  ................
      70  68 FB 12  0 DC E0 40  0 3C BA 42  0 78 FB 12  0  h.....@.<.B.x...
      80  CD E3 40  0 3C BA 42  0  8 BB 42  0 E4 FB 12  0  ..@.<.B...B.....
      90  3C 18 41  0 C4 FB 12  0  2  0  0  0 FC 3A 7C  0  <.A..........:|.
    
  • Advantage: the offset of each record can be calculated from its record number.  This makes direct access possible.
  • Advantage: there is no space overhead.
  • Disadvantage: there will probably be internal fragmentation (unusable space within records.)
 
 

Delimited Variable Length Records

 

variable length record

A record which can differ in length from the other records of the file.

delimited record

A variable length record which is terminated by a special character or sequence of characters.

delimiter

A special character or group of characters stored after a field or record, which indicates the end of the preceding unit.

     

  • Record 1 # Record 2 # Record 3 # Record 4 # Record 5 #
  • The records within a file are followed by a delimiting byte or series of bytes.
  • The delimiter cannot occur within the records.
  • Records within a file can have different sizes.
  • Different files can have different length records.
  • Programs which access the file must know the delimiter.
  • Offset, or position, of the nth record of a file cannot be calculated.
  • There is external overhead for record separation equal to the size of the delimiter per record.
  • There should be no internal fragmentation (unused space within records.)
  • There may be no external fragmentation (unused space outside of records) after file updating.
  • Individual records cannot always be updated in place.
  • Algorithms for Accessing Delimited Variable Length Records
  • Code for Accessing Delimited Variable Length Records
  • Code for Accessing Variable Length Line Records
  • Example (Delimiter = ASCII 30 (IE) = RS character:
       0  66 69 72 73 74 20 6C 69 6E 65 1E 73 65 63 6F 6E  first line.secon
      10  64 20 6C 69 6E 65 1E                             d line.         
    
  • Example (Delimiter = '\n'):
       0  46 69 72 73 74 20 28 31 73 74 29 20 4C 69 6E 65  First (1st) Line
      10   D  A 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 6C  ..Second (2nd) l
      20  69 6E 65  D  A                                   ine..           
    
  • Disadvantage: the offset of each record cannot be calculated from its record number.  This makes direct access impossible.
  • Advantage: there is space overhead for the length prefix.
  • Advantage: there will probably be no internal fragmentation (unusable space within records.) 

   

    Delineation of Fields in a Record

 

  Fixed Length Fields

  • Field 1 Field 2 Field 3 Field 4 Field 5
  • Each record is divided into fields of correspondingly equal size.
  • Different fields within a record have different sizes.
  • Different records can have different length fields.
  • Programs which access the record must know the field lengths.
  • There is no external overhead for field separation.
  • There may be internal fragmentation (unused space within fields.)
 
   

  Delimited Variable Length Fields

  

  • Field 1 ! Field 2 ! Field 3 ! Field 4 ! Field 5 !
  • The fields within a record are followed by a delimiting byte or series of bytes.
  • Fields within a record can have different sizes.
  • Different records can have different length fields.
  • Programs which access the record must know the delimiter.
  • The delimiter cannot occur within the data.
  • If used with delimited records, the field delimiter must be different from the record delimiter.
  • There is external overhead for field separation equal to the size of the delimiter per field.
  • There should be no internal fragmentation (unused
 
  

   Length Prefixed Variable Length Fields

  

  • 12 Field 1 4 Field 2 10 Field 3 8 Field 4 7 Field 5
  • The fields within a record are prefixed by a length byte or bytes.
  • Fields within a record can have different sizes.
  • Different records can have different length fields.
  • Programs which access the record must know the size and format of the length prefix.
  • There is external overhead for field separation equal to the size of the length prefix per field.
  • There should be no internal fragmentation (unused space within fields.)
  

    

No comments:

Post a Comment