First of all what we’d like to achieve? The task is to convert a string, most of the cases single word, by capitalize the first letter. In my case I’ve the world countries names all lower cased, while I need them with first letter capitalized. In example “united states” must become “United States”, but not “United states” or “UNITED STATES”. So here began the journey into PHP string functions, especially those for capitalization!
$str1 = 'foo bar'; $str2 = 'Foo bar'; $str3 = 'FOO BAR'; $str4 = 'фуу бар'; echo ucwords($str1); // Foo Bar echo ucwords($str2); // Foo Bar echo ucwords($str3); // FOO BAR echo ucwords($str4); // фуу бар
Here we have four strings. A lower cased, an upper cased, a mixed cased and … a lower cased Cyrillic string. First of all the main reason why ucwords doesn’t fit here is because of the Cyrillic string. Whatever non-Latin string you have you can forget about capitalization. However the other strings conversions are also interesting. Take a look at the third string! Here the string remains “FOO BAR” instead of going “Foo Bar”, which simply means that this function only looks, and hopefully changes, the first letter.
So here we have two questions. How can we overcome the Cyrillic problem and how to “normalize” the UPPER CASE string?
This is another useful function in PHP. ucfirst as you may guess from its name converts a string by only changing its first letter. So “Foo bar” will remain “Foo bar”, while with ucwords it has become “Foo Bar”. Let’s see what this function does:
$str1 = 'foo bar'; $str2 = 'Foo bar'; $str3 = 'FOO BAR'; $str4 = 'фуу бар'; echo ucfirst($str1); // Foo bar echo ucfirst($str2); // Foo bar echo ucfirst($str3); // FOO BAR echo ucfirst($str4); // фуу бар
Here even the first string has only one capital letter – “foo bar” became “Foo bar”, and yet again we’ve the Cyrillic string unchanged. It simply doesn’t help us here!
As a PHP developer you know what the “mb_” prefix means – multibyte. This is quite useful. You can convert the string whatever the encoding is, so perhaps we can overcome the Cyrillic problem. But before proceeding to tests, let’s take a look at the parameters of this function.
The first thing to note here is that mb_convert_case doesn’t contain the case in its name – upper or lower. There’s a second parameter, after the first which is the string itself, who setups that. Note that here you don’t have the typical camel case or capitals parameter name, but MB_CASE_TITLE (as you know in English the title is always capitalized):
echo mb_convert_case($str, MB_CASE_TITLE, ...
And a third one which specifies the encoding:
echo mb_convert_case($str, MB_CASE_TITLE, 'utf-8')
Now let’s see what we can achieve with it:
$str1 = 'foo bar'; $str2 = 'Foo bar'; $str3 = 'FOO BAR'; $str4 = 'фуу бар'; echo mb_convert_case($str1, MB_CASE_TITLE, 'utf-8'); // Foo Bar echo mb_convert_case($str2, MB_CASE_TITLE, 'utf-8'); // Foo Bar echo mb_convert_case($str3, MB_CASE_TITLE, 'utf-8'); // Foo Bar echo mb_convert_case($str4, MB_CASE_TITLE, 'utf-8'); // Фуу Бар
As you can see now the Cyrillic problem doesn’t exists and mb_convert_case is intelligent enough to change “FOO BAR” into “Foo Bar” – as I said this is the English style titling. That is by no means the solution when you deal with capitalization with different encoding.
However there is another approach to overcome the all UPPER CASE conversion problem. A possible solution is to convert the string first to a lower case string.
strtolower is very useful PHP string function and perhaps any PHP developer has used it at least once. But yet again – it does not do the job. Again because of the encoding problem.
echo strtolower('ФУУ БАР'); // #*&$(#*%#
As you can see the Cyrillic string cannot be lower cased! Let’s search again into the “mb_” universe.
This is the function. Again you’ve to specify the encoding:
echo mb_strtolower('ФУУ БАР', 'utf-8')
It doesn’t matter whether you’re native English speaker or not. Most of the web sites are multilingual and you cannot be sure what happens when you convert strings in Alphabets different from the Latin. Thus be careful even when everything seems to be OK with Latin string tests.