VIM Issues on Editing Binary Files 1


File Encoding Issue

Suppose you need to use VIM to edit a jpg file(assume its name is test.jpg), how to do? I used to just type “vi test.jpg”, then I see a lot of question marks(‘?’) and the first line looks like this:

????^@^PJFIF^@^A^A^@^@^A^@^A^@^@??^@C^@

Is this the expected result? Absolutely not. If I open this file via “vi -b test.jpg”, I get(the first line only):

<ff><d8><ff><e0>^@^PJFIF^@^A^A^@^@^A^@^@<ff><db>^@C^@

What’s the cause?

When VIM opens a file without “-b” flag, it will treat the file a text file, and try to convert it using utf-8 or other encodings(follow the link: [VIM显示十六进制出错] to see some detail). I set this in the .vimrc file:

set fileencodings=utf-8,ucs-bom,gb18030,gbk,gb2312,cp936    " Set encoding to guess, so we can see the right encoding for non-utf8 files

Since I’m a Chinese I always need to view or edit files that are encoded by GBK or GB18030, this setting lead VIM to show GBK encoding file content correctly, it’s very convenient and help me edit the Chinese encoded files painlessly. However, With this setting, when VIM opens the test.jpg file(without “-b” flag), it try to decode its content using utf-8(because it is the first in the fileencodings list). But there are byte sequences that can’t be decoded using utf-8(in the example above, 0xff, 0xd8, 0xe0, etc.), so VIM replace the original byte(s) to ‘?’, indicating the fact that VIM can’t recognize them.

So, when you want to edit a binary file, do remember to add the “-b” flag(or use “:edit! ++bin” command to reload current buffer using binary mode after open the file).

The Annoying Automatically Added Trailing New Line(‘\n’)

It’s a good practice to add a new line character(‘\n’, which ASCII code is 0xa) to the end of a text file(see [Why should files end with a newline]for more info), if you don’t, some programs may fail to handle the file, see the example below.

echo "a" > test # the file's content is "a\n"
echo -n "b" >> text # now the content is "a\nb"

while read line;
do
    echo $line;
done < test

# you might suppose the output is:
a
b
# however you'll surprisingly see only an "a" line

VIM considers the trailing new line rule as a mandactory rule and when opening a file which is not ended by a ‘\n’ it will add it automatically. Let me show this to you.

$ ls -l test
-rw-r--r-- 1 rex staff 3 Sep 12 08:52 test

# the file's size is 3 bytes, including an 'a', an '\n' and a 'b'
$ vi test
a
b
# inside the vi editor, type this command: ":%!xxd", then you can see file content's binary representation
0000000: 610a 620a                                a.b.
# as you can see, there is a 0x0a byte after 0x62('b'), VIM editor adds this automatically when it loads the file
# now convert back to the normal text view by typing the command: ":%!xxd -r"
# save the file and quit: ":wq"

$ ls -l test
-rw-r--r-- 1 rex staff 4 Sep 12 10:05 test
# the file size changed! the '\n' is appended to the end of the file.

So far so good, it’s very nice of VIM to do this automatically for us, right?

Yes in most circumstances, but not all. This feature might become a nightmare when you do not want to add the extra trailing ‘\n’, like, editing binary files.

One day I was handling a binary file, which is a gzip file with a customized 512 bytes header, so if I want to unpack the file, I have to remove the customized header. I opened the file by VIM, removed the first 512 bytes, and then saved it back to disk. It’s quite a easy job so I didn’t expect any exceptions. But I was wrong. Later when I tried to decompress the resulting file, I got an error message: “unzip: initrd.img: trailing garbage ignored”.

What happened?!

To make things clear, let’s construct such a binary file first.

echo "demo" > test
gzip -k -S .gz test
# now we have a test.gz file
ls -l test.gz
# the output is: -rw-r--r-- 1 rex staff 30 Sep 12 10:53 test.gz
# the file size of test.gz is 30

echo -n "abcd" > test.img # "abcd" is the header
cat test.gz >> test.img # append the test.gz to test.img

# now we get a file similar to what I got above
file test.img
# the output is: test.img: data
ls -l test.img
# the output is: -rw-r--r-- 1 rex staff 34 Sep 12 11:21 test.img

# view the content
vi test.img
# inside VIM, use ":q!" to quit witout any modification

Then, use VIM to delete the header.

$ vi -b test.img
# inside VIM, select the header part("abcd"), then hit 'x' to delete them
# enter ":wq!" to save

Now take a look at the resulting file.

$ file test.img
test.img: gzip compressed data, was "test", from Unix, last modified: Fri Sep 12 10:54:17 2014

$ ls -l test.img
-rw-r--r-- 1 rex staff 31 Sep 12 11:25 test.img
# note that the file size change to 31, the file size of test.gz is 30

$ gunzip -c test.img # unzip to stdout
demo
gunzip: test.img: trailing garbage ignored

As you can see, after deleted the header(“abcd”), the size of test.img change from 34 to 31, rather than 30, why?

I believe you already have the answer: the trailing ‘\n’. When VIM open the test.img, it detects that the file is not ended by ‘\n’, so it adds the ‘\n’ automatically, even though I set the “-b” flag to indicate the fact that the file is a binary file.

This is very weird and very annoying! It’s unreasonable for VIM to discard the “-b” flag, is it a VIM bug, or some setting issues?

I spent a lot of time googling, but without luck. Everyone says that once you set the “-g” flag, VIM won’t add the trailing ‘\n’. If this is not a bug of VIM, then what is the cause?

After some effort of trying, I finally found out the root cause, hooray! The cause is: the restore_view plugin.

  1. The first time I open test.img, I forgot to set the “-b” flag.
  2. When I quit VIM, although I didn’t change the content of test.img, restore_view automatically created a view file(in the .vimviews folder). Since I opened test.img without “-b” flag, vim treat the file as text file so the saved view file contains this setting(nobinary).
  3. When I opened it again with “-b” flag, the restore_view plugin restored settings that have been saved in the .vimviews folder, which set the file to “nobinary”, overriding the “-b” flag.
  4. Since the file is not binary, VIM detects that the file is not ended with ‘\n’ and it adds it automatically.
  5. When I was saving the file, there was a warning(Read-Only buffer warning), but I just ignored the warning and used “:wq!” to force to save.

P.S.: the restore_view plugin might cause another problem:

  1. Open a file that the settings saved in .vimview indicates that it is a “nobinary” file
  2. Use “:edit! ++bin” to reload it in binary mode
  3. Use “:%!xxd” to convert to hex view
  4. (with or without moidification) execute “%!xxd -r” to convert back to binary view, you will notice that the file buffer changes to “nobinary” mode(there are a lot of ‘?’ if you set utf-8 as the first encoding in fileencodings)

Now everything is clear, and I’ve learned a good lesson: always add the “-b” flag if you want to edit a binary file!

Conclusion

  1. Always add “-b” flag when opening binary files
  2. If forgot to add “-b” flag, execute “:edit! ++bin” to set current buffer to binary
  3. If you use restore_view plugin and meet some weird issues, clear the file’s setting(which is stored in ~/.vimviews or any other folder that you set to) and try again
  4. Check other plugins or settings that might override the “-b” flag

Author: Rex Shen

Created: 2014-09-12 Fri 18:42

Emacs 24.3.1 (Org mode 8.2.7c)

Validate


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “VIM Issues on Editing Binary Files

  • Jordan

    You share interesting things here. I think that your blog can go viral easily, but you must give it initial boost
    and i know how to do it, just search in google – mundillo traffic increase