>>>NOTE: You may want to start with a much simpler article in hiding member in structs.
Data hiding and encapsulation in C is fairly easy using the notions of derived and incomplete types. A derived type is a user-defined type typically declared as members of a struct. An incomplete type is where the definition of a type is located within the scope as the type declaration. Incomplete types extremely useful for data hiding when type declaration is in the header file, and the definition and implementation of the type is shrouded in the C source file. For example, consider defining a type for handling paintings in an art collection:
struct painting {
uint32_t inventory_control;
uint32_t purchase_price;
char painting_name[256];
char artist_name[256];
};
This struct definition could be placed in a header file, say, painting.h, for inclusion into application code, then the members of the struct can be accessed as needed. But what happens when you need to change the members of the struct? Perhaps you would prefer to allocate the char buffers instead of declaring their sizes statically, and need to add a member for current owner:
struct _painting {
uint32_t inventory_control;
uint32_t purchase_price;
char * painting_name;
char * artist_name;
char * owner_name;
};
Now all the code depending on the first definition is broken. One way out of this bind is using incomplete types with accessor methods, just as you would use them in Java or C++.
In the header file painting.h, declare the painting type with a typedef:
typedef struct _painting Painting;
How you choose to capitalize, underscore or otherwise name “classes” is your business. Personally, I loathe the so-called “CamelCase” convention, but will use capital letters to denote a user-defined type.
Memory management
Managing memory is one of the most error-prone aspects of writing code in the C programming language. Using incomplete types, notice that you now have no way to directly allocate memory for your type in your application program, although you can deallocate using free() anywhere you have a Painting pointer declared. So allocation must be wrapped, and it makes really good sense to wrap the deallocation as well. For example, here is one way to do it:
Painting *
painting_new(void) {
Painting * p = (Painting *)malloc(sizeof(Painting));
memset(p,0xda,sizeof(Painting));
return p;
}
void
painting_delete(Painting * p) {
free(p->artist_name);
free(p->owner_name);
free(p->painting_name);
memset(p,0xdd,sizeof(Painting));
free(p);
}
So what’s going on here? Why the call to memset? And why different values when allocating (0xda) versus freeing (0xdd)? This technique is called “shredding” and I was first exposed to it in an excellent
book by Steve Maguire called “Writing Solid Code.”
The purpose of shredding is to set all the bytes in a struct (or other allocated set of bytes) to a value that has meaning to the programmer, but is otherwise nonexecutable nonsense. The value of this practice is suddenly realized when stepping through code in a debugger. Series of 0xdadadada indicate you are accessing an uninitialed field in the data structure. Similarly, series of 0xdddddddd indicate you are accessing memory that has been freed.
Defining accessors
Now, in the c file, define the _painting struct as above, and provide get/set methods for each member. For example, to get and set the values for artist_name, I write these methods as follows:
char *
painting_get_artist_name(Painting * p) {
return p->artist_name;
}
void
painting_set_artist_name(Painting * p, char * artist_name, size_t s) {
strncpy(p->artist_name,artist_name,s);
}
The appropriate prototypes for these methods are declared in the painting.h header file.
Again, how you choose to capitalize, underscore names and otherwise format your code is your business, but let’s take a closer look at my convention. First, I use the name of the struct (painting) to prefix all method calls that are publicly declared in the header file. I follow this with a verb (set or get) to indicate the action I want to perform, then the name of the member of the struct as object to the action, in this case “artist_name.” Note that I pass the length of the name in as a parameter for use in the strncpy function to guard against buffer overflow problems. (Where you get this length is your business as well).
Now, you have a header file that functions as an interface to your source code. You can add or delete members of any struct at any time without breaking your application code. To handle members that have been removed, you can signal error conditions in the appropriate get/set methods. Handling error conditions can be done in several ways, but that topic is outside the scope of this document at the moment.
Generate code automatically
Lastly, while all this code appears wordy, note that it’s pretty easy to write a code generator in any language that can handle regular expressions. I have written code generators in sh, perl and lua, which have variously taken key, value pairs or struct definitions as inputs. Developing an API with more than a dozen types of structs, each of which have 4 to 40 members makes automatic code generation time-effective.
Pitfalls
The approach above has some traps for the unwary, the most important of which is where and how to allocate memory for pointer fields in the struct. Several different approaches can be taken; I will investigate a couple in a future update to this post.
Links and resources
- Very good article on incomplete types in general.
- Incomplete types as data abstractions.
- Steve deKorte’s overview on Object-oriented C programming. (This link is currently broken. It was a good essay, and Steve deKorte is a very good programmer.)
Moving to C++
In c++, this technique is often referred to as “D Pointers,” “Handles” or “Opaque Pointers.” Some good links include:
http://www.archive.org == “Internet Archive Wayback Machine”, i.e. a site that archives old versions of web pages.
Therefore the essay can be found in a few seconds, if you have the original URL (and you did):
http://web.archive.org/web/20070630120840/http://www.dekorte.com/docs/essays/ooc/
but what if the client define the structure/union with the same name as hidden struct/union name??
@nyan – I think scope will take care of that, but please post an example of what you mean and I’ll update the article.
.-= David M. Doolin, PhD´s last blog ..What the Heck is HTML TITLE element and why do I need one? =-.
Thanks for another great article, David. I’m looking forward to the update on how to allocate memory for the pointer fields. Also, I wanted to ask why you passed the character length in a separate parameter instead of determining it directly in the function itself? And one last question: What do you think of using function pointers in the struct itself to access the fields?
Dennis, I’ll take these one at a time:
This is a great question. The answer partly depends on how you’re using the object. For the
char *in this struct, I’m assuming that the using code will allocate somewhere along the way, and – this is important – these pointers aren’t referenced by any other code.Not sure exactly what you mean. Which line of which snippet?
Not exactly sure what you mean, please post a little snippet. Also, read the articles on single inheritance in c, that might asnwer your question.
Thanks for your reply David.
1:
Ok, so you wouldn’t actually include it in the set function because then the user might not see the malloc and thus forget to free it?
2:
You said you guard against possible buffer overflows, would it be unsafe if I do it like this:
void painting_set_artist_name(Painting * p, char * artist_name) {
size_t s = strlen(artist_name)+1; // check inside the function
strncpy(p->artist_name, artist_name, s);
p->artist_name[s] = ”;
}
painting_set_artist_name(p, “Picasso”);
3:
Basically, I wanted to know what you think about the technique to include a function pointer as is done in the struct below:
struct painting {
uint32_t inventory_control;
uint32_t purchase_price;
char * painting_name;
char * artist_name;
char * owner_name;
char * (*get_artist_name)(Painting *);
void (*set_artist_name)(Painting *, char *);
};
// this could also be done in a separate setup function when creating the object
p->set_artist_name = &painting_set_artist_name;
// this would be our call in the code
p->set_artist_name(p, “Picasso”);
1. Yes and no. If you can do your own name mangling, you can send create a set of functions which handle the cases most useful for you. Warning: if you go down this road, and find yourself writing code generators, you might as well move to c++…
2. I’ll write some test code on setting names, but what you’re proposing is very close to one good way to do it.
3. I think this is overkill. These are public fields,
so p->artist_name is just finebut it really depends on whether you want your strings on the stack or on the heap. What I’ve shown here expects an allocated char buffer, hence the calls to free. Also, the code above may or may not complain about unallocated memory, I’m going to revise it slightly to mitigate that.If you wanted to make them private fields, use char * get_artist_name(Painting * p) to send back a copy. However, I’m going to think about this and write up some test code to see how it behaves, and figure out what a viable use case would be.
Thanks for your observations. I code very little C these days, but it’s still one of my favorite languages.