Yii and PHP i18n & L10n

Alexander Makarov

Yii core team, Stay.com

http://slides.rmcreative.ru/2015/webconf-i18n-l10n/

Goals

  • Learn what's i18n, L10n
  • Learn about how it's done
  • Avoid common mistakes
  • Explain what PHP intl really is
  • Data translation

What's i18n?

Internationalization == globalization

Ability of an app to be adapted to various languages and regions without engineering changes.

i.e. regardless of locale chosen. Developer's tasks.

What's L10n?

Localization

The process of adapting software for a specific region or language by adding locale-specific components and translating text.

Translator tasks.

Locale?

A data set that includes info about language, country, formats, currencies etc. Usually varies per country. Sometimes there are sub-variations.

i18n problems (developers)

  • Support various characters (utf8, right collation)
  • LTR / RTL
  • String translation
  • Locale switching
  • Language/locale aware formatting (dates, currencies, geographical names)
  • Language/locale based substitutions
  • Timezones
  • ...

Approaches

  • ICU (PHP intl, Yii 2.0, Aura, Java)
  • gettext (Django, Yii 2.0, Yii 1.1)
  • Custom stuff based on CLDR. Not as powerful (Yii 1.1, RoR, Symfony, Laravel)
  • Totally custom or just string replacement. Meh

intl

  • PHP extension.
  • A wrapper around icu4c.
  • Uses CLDR.
  • Awful docs.
  • Has issues.

CLDR

Unicode Common Locale Data Repository

cldr.unicode.org Huge data set of i18n data per locale. Used by companies such as Apple, Google, IBM, Microsoft, Adobe, lots of Linux distribs. Java, Perl, Python, jQuery, PHP intl and more.

ICU

icu-project.org

Libraries for C and Java that use CLDR data for message formatting, locale selection etc.

What intl can do

  • Collator - locale-aware string collation
  • Locale - various locale utils
    • Locale::acceptFromHttp
    • Locale::canonicalize
  • IntlCalendar - locale-aware calendar
  • ResourceBundle - loads resources (later)
  • Transliterator - yum!
  • IntlBreakIterator - locating boundaries in text
  • UConverter - encoding converter
  • NumberFormatter - locale-aware number formatting
  • MessageFormatter - powerful message formatter
  • and more

Message formatting

Numbers


echo \MessageFormatter::formatMessage(
    'en_US', 'Value: {value, number}', ['value' => 123456,789]
);
// Value: 123,456.789

echo \MessageFormatter::formatMessage(
    'en_US', 'Price: {price, number, currency}', ['price' => 100]
);
// Price: $100.00

echo \MessageFormatter::formatMessage(
    'en_US', 'Value: {value, number, percent}', ['value' => 123]
);
// Value: 123%
                    

Date and time


echo \MessageFormatter::formatMessage(
    'en_US', 'Date: {d, date, short} | {d, date, medium} | {d, date, long}
     | {d, date, full}', ['d' => $d]
);
// Date: 4/18/15 | Apr 18, 2015 | April 18, 2015
   | Saturday, April 18, 2015

echo \MessageFormatter::formatMessage(
    'ru_UA', 'Date: {d, date, short} | {d, date, medium} | {d, date, long}
     | {d, date, full}', ['d' => $d]
);

// Дата: 18.04.15 | 18 апр. 2015 г. | 18 апреля 2015 г.
   | суббота, 18 апреля 2015 г.
                    

Spellout


echo \MessageFormatter::formatMessage(
    'en_US', '{n,number} is spelled as {n, spellout}', ['n' => 42]
);
// 42 is spelled as forty-two

echo \MessageFormatter::formatMessage(
    'en_US', 'I am {n, spellout,%spellout-ordinal} agent', ['n' => 47]
);
// I am forty-seventh agent

Plurals


$message = 'Здесь {n, plural, =0{котов нет} =1{есть один кот} one{# кот}
            few{# кота} many{# котов} other{# кота}}!';
echo \MessageFormatter::formatMessage('ru_UA', $message, ['n' => 1]);
// Здесь есть один кот!

$message = 'There {n, plural, =0{are no cats}
            =1{is one cat} other{are # cats}}!';
echo \MessageFormatter::formatMessage('en_US', $message, ['n' => 1]);
// There is one cat!

Ordinals


echo \Yii::t('app', 'You are the {n, ordinal} visitor here!', ['n' => 42]);
// You are the 42nd visitor here!

Selection


echo \Yii::t('app', '{name} is a {gender} and {gender, select, female{she}
    male{he} other{it}} loves Yii!', [
    'name' => 'Snoopy',
    'gender' => 'dog',
]);
// Snoopy is a dog and it loves Yii!

There are more but... enough

Intl is powerful!

intl problems

  • Docs
  • Named parameter issues
  • Sometimes it's not installed
  • Sometimes ICU is outdated

Named parameter issues

Positional ones could be used:


$message = 'There {0, plural, =0{are no cats}
            =1{is one cat} other{are # cats}} except {1}!';
echo \MessageFormatter::formatMessage('en_US', $message, [1, 'Simon']);
                    
But that sucks!
Solved in Yii, should be OK in Aura.

intl not installed

If you're making a product provide fallback. At least for English. If you're making a service install it.

Outdated intl or ICU data

Update it:

Hidden gems

intl relies on CLDR data which is kinda more than is exposed via extension API. Can we access raw data?

Yes!

intl manual lies to us

new \ResourceBundle($resourceFileName, $resourceDirName);
$resourceDirName = null means root dir of intl internal resources.
Resources are compiled but you can find a list of files or decompile icudt49.dll if you're on Windows or try building resources from source.

private function dumpResourceBundle($bundle, $depth = 0)
{
    if ($bundle === null) {
        return 'NULL';
    } elseif (is_scalar($bundle)) {
        return $bundle;
    }

    $out = '';
    foreach ($bundle as $k => $v) {
        $out .= str_repeat(' ', $depth) . $k . ' = ';
        if ($v instanceof \ResourceBundle || is_array($v)) {
            $out .= " [\n";
            $out .= $this->dumpResourceBundle($v, $depth + 1);
            $out .= str_repeat(' ', $depth) . "\n]\n";
        }
        else {
            $out .= $v . "\n";
        }
    }
    return $out;
}
                    
new \ResourceBundle('en_UK', null);

Instead of en_UK try more locales. If there's no data, try en.

  • month names
  • weekday names
  • quarter names
  • various formats for various calendars
  • units
  • measurement system names
new \ResourceBundle('en_US', ...);
  • ICUDATA-curr = currencies
  • ICUDATA-region = regions
  • ICUDATA-zone = time zone names
  • ICUDATA-lang = languages

Less useful:

  • ICUDATA-rbnf - rule based number format (spellout)
  • ICUDATA-translit - transliteration rules and tables (huge!)
new \ResourceBundle(..., null);
  • metadata = language aliases, region codes, script aliases, territory aliases, variant aliases
  • plurals = rules for plural and selectordinal
  • zoneinfo64 = Olson DB?!

Research results

http://intl.rmcreative.ru/

Give it a locale and it will show you plurals format, selectordinal format etc.

L18n problems (translators)

As little work as possible

  • Translation itself
  • Template adjustments (Colors, legal requirements)

Source strings

English or keys.in.english.

Message storage format

Doens't really matter.

PHP arrays are OK for non-tech people.

Don't try Google Translate ;)

Message scanner

A tool that scans code and updates translation files automatically adding new strings and removing old ones.

We have it in Yii ;)

Data translation

How to store posts in X languages?

  • UTF-8
  • Filter by language

Many columns


  • - Complex query
  • - Hard to add languages

Same table, multiple records


  • ! Each language record is a separate independent one
  • + Simple
  • + Easy to add translations

Single record + translation in another table


  • + Simple maintenance
  • + Easy to add translations
  • + Only one join

Always normalize language!

Use ISO-639-2: es, en, fr or RFC-5645: zh_CN, ru_UA.

Locale::canonicalize($language);

More reading

Questions time!