Don’t initialize your variables

Developers coming from C know that variables should always be initialized. Not initializing your variables means they contain junk, and this can result in undefined behavior. For example:

#include<stdio.h> 
int main(void) {
    char buffer[256];
    char answer;
    char* name;

    printf("Do you want to enter a name? [yn] ");
    answer = getchar();

    while (getchar() != '\n') { } // because we need CR for getchar but it doesn't read the CR...

    if (answer == 'y') {
        printf("Please enter name: ");
        name = fgets(buffer, 256, stdin);
        if (name == 0) {
            name = "<too long>";
        }
    } else if (answer == 'n') {
        name = "<user refused to enter name>";
    }

    printf("The name is %s\n", name);
    return 0;
}

If the user entered a character that is not y or n, not of the name = ...; statements will be executed, and name will still hold the same value it had when main started. What is that value? In release mode C, that would be whatever random data happened to be in that piece of memory name was assigned. And then we take that utterly random number and pass it to printf where it’ll get printed as if it was a string pointer!

If we are lucky, we’ll hit some illegal memory address and the OS will stop us. If we aren’t it’ll just go to some random place at memory and start printing whatever it encounters: passwords, credentials, application tokens…

And of course – this will not be reproducible. Because every time you run the program, there will be a different value at that place in memory and you’ll get different results.

To avoid these problems, C developers have conditioned themselves to always initialize their variables. If you don’t have something meaningful to put in the point of declaration – just put 0:

#include<stdio.h> 
int main(void) {
    char buffer[256] = {};
    char answer = '\0';
    char* name = 0;

    printf("Do you want to enter a name? [yn] ");
    answer = getchar();

    while (getchar() != '\n') { } // because we need CR for getchar but it doesn't read the CR...

    if (answer == 'y') {
        printf("Please enter name: ");
        name = fgets(buffer, 256, stdin);
        if (name == 0) {
            name = "<too long>";
        }
    } else if (answer == 'n') {
        name = "<user refused to enter name>";
    }

    printf("The name is %s\n", name);
    return 0;
}

While null pointer dereference is still formally an undefined behavior, it is still much better than random pointer dereference because your operation system will probably make it s SEGFAULT – which is better than security leaks.

OK, but that’s C. What about more modern languages?

There are two main reason this was so needed in C:

  1. Uninitialized variables having junk data.
  2. Inability to declare variables in the middle of a block.

More modern languages allow declaring variables in the middle of a block, so it is usually preferable to only declare the variable at the point where you have something meaningful to put in it.

This greatly reduces the cases where you have to initialize something with a default value – but does not prevent all of them. In our case, for example, name gets its value inside if branches – if we declared it there we wouldn’t be able to use it after the if. Some languages (mostly the functional ones) have easy syntax solution, but in most mainstream languages you’d have to either extract it to a function or declare the variable outside the block.

When going with the latter solution, because C is such a common background, many developers will initialize the value. So if we convert our code to Java:

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        System.out.print("Do you want to enter a name? [yn] ");
        String answer = scanner.nextLine();

        String name = null;
        if ("y".equals(answer)) {
            System.out.print("Please enter name: ");
            name = scanner.nextLine();
        } else if ("n".equals(answer)) {
            name = "<user refused to enter name>";
        }

        System.out.printf("The name is %s\n", name);
    }
}

Sure, this is Java, a language with managed memory that will never allow undefined behavior from uninitialized variables, so we don’t really need to initialize name to null, but better safe than sorry, right?

WRONG!

Java analyses code paths to make sure no variable can be used without being initialized first. So if we remove the initialization:

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        System.out.print("Do you want to enter a name? [yn] ");
        String answer = scanner.nextLine();

        String name;
        if ("y".equals(answer)) {
            System.out.print("Please enter name: ");
            name = scanner.nextLine();
        } else if ("n".equals(answer)) {
            name = "<user refused to enter name>";
        }

        System.out.printf("The name is %s\n", name);
    }
}

We’ll get a compilation error:

$ javac Main.java 
Main.java:18: error: variable name might not have been initialized
        System.out.printf("The name is %s\n", name);
                                              ^
1 error

I just broke the compilation, but this is a good thing – the compiler found a bug! The same bug we had in the C version – what if the user enters something which isn’t y or n. The Java compiler sees that there are three possible code paths that reach the last line but we are only initializing two of them.

To be able to compiler again, we must tell Java what to do in case the user gave an invalid answer. Failure is also an option – as long as we do it intentionally:

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        System.out.print("Do you want to enter a name? [yn] ");
        String answer = scanner.nextLine();

        String name;
        if ("y".equals(answer)) {
            System.out.print("Please enter name: ");
            name = scanner.nextLine();
        } else if ("n".equals(answer)) {
            name = "<user refused to enter name>";
        } else {
            System.err.printf("Illegal answer \"%s\". The only legal answers are \"y\" and \"n\".", answer);
            return;
        }

        System.out.printf("The name is %s\n", name);
    }
}

Now there are still three code paths, but in the third we return from the function early, before printing name. The Java compiler can determine that there are no code paths where name is used without being assigned a value first – and thus the compilation succeeds.

This is still initialization

Despite the clickbaity title, we do actually initialize name. We don’t do on declaration, but we are initializing it nevertheless. This compiles:

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        System.out.print("Do you want to enter a name? [yn] ");
        String answer = scanner.nextLine();

        final String name;
        if ("y".equals(answer)) {
            System.out.print("Please enter name: ");
            name = scanner.nextLine();
        } else if ("n".equals(answer)) {
            name = "<user refused to enter name>";
        } else {
            System.err.printf("Illegal answer \"%s\". The only legal answers are \"y\" and \"n\".", answer);
            return;
        }

        System.out.printf("The name is %s\n", name);
    }
}

Wait – how? Didn’t they teach us that you can’t change the value of a final variable?

Well, yes, but we are not changing the value of any final variables here – we are just initializing it. Since name has never been assigned before in either of the paths that assign to it, these assignments are actually initializations – which are perfectly fine for final variables. It wouldn’t have worked with final String name = null, but without the initialization on declaration it’s fine, and even without the final name could be used in lambdas (provided they appeared after the first assignment).

Conclusion

Do initialize your variables – but don’t always force a default value when you can’t initialize them with a proper one. Know how your language behaves with uninitialized variables and pick the best strategy for uncovering bugs.

原文链接:Don’t initialize your variables

© 版权声明
THE END
喜欢就支持一下吧
点赞10 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容