[dotTech Explains] What is the difference between “size” and “size on disk”?

Ever look at a file or folder and wonder why the heck the “size” and “size on disk” are different? Yeah, me too. Until recently I had no idea why these two are different and just brushed it off as some technical mumbo jumbo. Ignorance may, at times, be bliss; but for the purposes of bettering my inner geek I decided it is time to learn exactly what these two are and why they are different. So I did and now it is time to share the wealth.

What Is “Size”?

The value labeled as “size” is the actual, literal size of the file(s)/folder(s) you have selected. So if dotTech.jpg is listed as a size of 1.25 MB, it is literally 1.25 MB large. If I were to download dotTech.jpg, I would be downloading 1.25 MB worth of data.

What Is “Size On Disk”?

“Size on disk” is the amount of disk space that the file(s)/folder(s) you have selected uses. So while dotTech.jpg may be 1.25 MB, it takes up – for example – 1.30 MB space on the hard drive.

Oh But Why Must They Be Different?

The answer comes down to file systems. Most (all?) file systems store data in clusters. Clusters are blocks of area on the hard drive. Data is stored in these clusters. 1 KB of data in a cluster is enough to reserve that whole cluster for the file in question, even if that cluster isn’t fully used by the file. Thus that difference between the total size of the last cluster used by a file and the total amount of the data stored in that cluster is what results in the difference between size and size on disk. Confused? Let’s look at an example.

Let’s say you have a 10.5 MB file. The “size” of the file is 10.5 MB. Now let’s say the file is stored on a disk that uses clusters of 1 MB. That means the 10.5 MB file needs 11 clusters of space to be fully stored (because 10.5 MB of data cannot be stored in 10 MB of clusters – it needs 11). 11 clusters equals 11 MB. Thus, since that last .5 MB of data occupies a 1 MB cluster, the “size on disk” for the file is 11 MB.

Is This A Trick To Make Us Buy Larger Hard Drives?

Yes. No. Maybe. I don’t know. Juicy gossip avoids my inbox. However, being the ultra rational logic wise man I am, I will go out on a limb and say, no. There is technical justification behind using data clusters. Trust me when I say very smart people have developed our digital world. They know what they are doing. And if they don’t know what they are doing, I don’t want to hear about it. Remember ignorance is bliss.

Do I Need To Lose Sleep Over This?

No. Heck, you don’t even need to know this difference exists. It really makes no difference in your life, assuming you are a normal person and not a programmer. (See what I did there?) Knowing the difference between “size” and “size on disk” is purely for self-knowledge purposes; and for being able to sound smart in front of your parents.

But What About When I Move Files From Drive To Drive?

What about it? The “size” of a file is the literal size of the file; that is the amount of data you move when moving files. (You are not moving clusters.) The available space on a drive is the amount of data – in clusters – that is available. As long as the size of a file is smaller than the available space, you are good to go — there will be enough clusters to accommodate that pron you are hiding the file(s) you are moving.

So Why Post About This?

Because I can.

Feel free to discuss in the comments below. Flame on.

[via Google]

Share this post

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

22 comments

  1. ebony

    Ashraf!!!

    Excellent info in words that I can understand and possibly repeat if need. Love it love it. All I can say is more more more.

    Now I have an off topic question. What is this? Why is it used? I have seen it many places and wondered.
    “that pron you are hiding” oh darn, the line through it did not show up but I hope you know what I mean.

  2. Philippe

    So Why Post About This?

    Because I can.

    I like it.

    Just a personal question : We are from all around the world following your very useful work. But since sometime you are referring to German magazine, I wonder about your background, because your English is for me “perfect”. Personally I’m a Swiss from the French part living in Florida. You do not have to answer, if you don’t feel like it.
    Anyway I read almost 90 % of your writing . Thanks

  3. Seamus McSeamus

    See, that’s why I visit here multiple times a day… there is always the possibility of learning something that I didn’t know. Now the only question remaining is what is the actual file size of this information vs the amount of clusters it is using in my brain!

  4. leland

    When you format the disk you can set many options like cluster size. The smaller the cluster size the more you can fit on the disk but it can also slow down file access. The default is what most people use and is fairly good for general use. In fact unless you know why you are changing from the default I would not change it as there can also be compatibility issues that arise from changing the defaults especially in low levels utilities like disk recovery and defraggers.

    That said if you are a developer and code a lot you are likely to have lots of little files that get compiled into programs. Many of the pieces will be very small and will waste lots of space. In a case like this it is good to have a custom partition formatted with smaller clusters to store the code on so you waste less space. The key is to isolate it on a separate partition to limit the system speed degradation.

    As always being an informed user is your best defense against any issues that may arise. Thanks Ashraf for covering a topic that can be quite interesting for many and will provide good food for thought.

  5. radek

    The file size may be whatever but the size on disk always is a multiple of 2^n. In my case 2^12 = 4096 bytes = 4kB. So if I make a file in notepad, shall we say, some 2 sentences, the resulting file may have 150 bytes but the size on disk is 4kB. And no one guarantees you the data are 100% overwritten. In fact, they’re not overwritten at all. Yes, the first 150 bytes overwritten are, but that’s all. You (or the one behind you, sorry) may not be able to restore them with any tool, but I know there are some who can do it… So it is!