Programming

The problem of codification

1 min.
Status: finished · Type: log · Importance: 0
Sharing files between different operatin systems (mainly between Windows and linux-based) is always a pain because the file names are usually in different codification. This post shows a shell utility to deal with codification problems in the file names.

Since my complete migration from Windows to linux-based system about sixteen years ago (in order: Red Hat Linux, Mandriva, Ubuntu, Elementary, Ubuntu and Mac OSX) I had problems with codifications both in the file names as well as the content coming from old files edited in Windows® or from backup copies.

In the case of file contents, most of them tex files, there was no major problem since the \usepackage[latin1]{inputenc} and the editor deal transparently with them. However, the name of the files was a problem until today.

I have found through Slackblog a shell utility name convmv that change the file name from one codification set to another. It works similarly to iconv (that I previously knew). For example:

$ convmv -r -f ISO-8859-1 -t UTF-8 * --notest 

will change the codification from ISO-8859-1 (Windows default) to UTF-8 of all the files in the current directory and any of their subfolders. Notice the --notest option. If this option is remove the command will print on screen all the file name changes without change them.

Share this article

Cite as: Francisco Torralbo, The problem of codification [Blog post]. March 30, 2020. Retrieved from: //www.ugr.es/~ftorralbo/blog/programming/codification/.

The problem of codification by Francisco Torralbo is licensed under a Creative Commons Attribution-ShareAlike 4.0 International.