Skip to content
Snippets Groups Projects
Unverified Commit ab2c5af3 authored by dpsutton's avatar dpsutton Committed by GitHub
Browse files

Create po template file (pot) from clojure (#23181)

* Create po template file (pot) from clojure

Rather than use xgettext we can use grasp to look for translation
sites. The reason to do this is twofold:

Multiline Translated Strings
----------------------------

We can use multiline strings in source to aide in readability. This
[PR](https://github.com/metabase/metabase/pull/22901)
was abandoned because we xgettext cannot be expected to combine string
literals into a single string for translation purposes. But we can
certainly do that with clojure code.

```clojure
(defn- form->string-for-translation
  "Function that turns a form into the translation string. At the moment
  it is just the second arg of the form. Afterwards it will need to
  concat string literals in a `(str \"foo\" \"bar\")` situation. "
  [form]
  (second form))

(defn- analyze-translations
  [roots]
  (map (fn [result]
         (let [{:keys [line _col uri]} (meta result)]
           {:file (strip-roots uri)
            :line line
            :message (form->string-for-translation result)}))
       (g/grasp roots ::translate)))
```

`form` is the literal form. So we can easily grab all of the string
literals out of it and join them here in our script. The seam is already
written. Then reviving the PR linked earlier would upgrade the macros to
understand that string literals OR `(str <literal>+)` are acceptable
clauses.

Translation context
-------------------

Allowing for context in our strings. The po format allows for context in
the file format.

```
msgctxt "The update is about changing a record, not a timestamp"
msgid "Failed to notify {0} Database {1} updated"
msgstr ""
```

See [this
issue](https://github.com/metabase/metabase/issues/22871#issuecomment-1146947441)
for an example situation. This wouldn't help in this particular instance
because it is on the Frontend though.

But we could have a format like
```clojure
(trs "We" (comment "This is an abbreviation for Wednesday, not the
possessive 'We'"))
```
The macro strips out the `comment` form and we can use it when building
our pot file.

Note there is a difficulty with this though since all source strings
must be unique. There can be multiple locations for each translated
string.

```
,#: /metabase/models/field_values.clj:89
,#: /metabase/models/params/chain_filter.clj:588
msgid "Field {0} does not exist."
msgstr ""
```
The leading commas are present to prevent commit message comments. But
if one location has a context and the other doesn't, or even worse, if
they have two different contexts, we have a quandry: we can only express
one context. Probably easy to solve or warn on, but a consideration.

Caught Errors
-------------

The script blew up on the following form:

```clojure
(-> "Cycle detected resolving dependent visible-if properties for driver {0}: {1}"
    (trs driver cyclic-props))
```

No tooling could (easily) handle this properly. Our macro assertions
don't see the thread. But xgettext never found this translation
literal. I warn in the pot generation so we can fix this. We could also
have a test running in CI checking that all translations are strings and
not symbols.

Fundamental Tool
----------------

The sky is the limit because of this fundamental grasp tool:

```clojure
enumerate=> (first (g/grasp single-file ::translate))
(trs "Failed to notify {0} Database {1} updated" driver id)
enumerate=> (meta *1)
{:line 35,
 :column 22,
 :uri "file:/Users/dan/projects/work/metabase/src/metabase/driver.clj"}
```

We can find all usages of tru/trs and friends and get their entire form
and location. We can easily do whatever we want after that.

Verifying Translation scripts still work
----------------------------------------

You can check a single file is valid with `msgcat <input.pot> -o
combined.pot`. This will throw if the file is invalid.

The general script still works:

```
❯ ./update-translation-template
[BABEL] Note: The code generator has deoptimised the styling of /Users/dan/projects/work/metabase/frontend/src/cljs/cljs.pprint.js as it exceeds the max of 500KB.
[BABEL] Note: The code generator has deoptimised the styling of /Users/dan/projects/work/metabase/frontend/src/cljs/cljs.core.js as it exceeds the max of 500KB.
[BABEL] Note: The code generator has deoptimised the styling of /Users/dan/projects/work/metabase/frontend/src/cljs/cljs-runtime/cljs.core.js as it exceeds the max of 500KB.
Warning: environ value /Users/dan/.sdkman/candidates/java/current for key :java-home has been overwritten with /Users/dan/.sdkman/candidates/java/17.0.1-zulu/zulu-17.jdk/Contents/Home
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Created pot file at  ../../locales/metabase-backend.pot  #<----- new line here
Warning: environ value /Users/dan/.sdkman/candidates/java/current for key :java-home has been overwritten with /Users/dan/.sdkman/candidates/java/17.0.1-zulu/zulu-17.jdk/Contents/Home
2022-06-06 15:05:57,626 INFO metabase.util :: Maximum memory available to JVM: 8.0 GB
Warning: protocol #'java-time.core/Amount is overwriting function abs
WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time.core, being replaced by: #'java-time.core/abs
WARNING: abs already refers to: #'clojure.core/abs in namespace: java-time, being replaced by: #'java-time/abs
2022-06-06 15:06:01,368 WARN db.env :: WARNING: Using Metabase with an H2 application database is not recommended for production deployments. For production deployments, we highly recommend using Postgres, MySQL, or MariaDB instead. If you decide to continue to use H2, please be sure to back up the database file regularly. For more information, see https://metabase.com/docs/latest/operations-guide/migrating-from-h2.html
2022-06-06 15:06:03,594 INFO util.encryption :: Saved credentials encryption is DISABLED for this Metabase instance. :unlock:
 For more information, see https://metabase.com/docs/latest/operations-guide/encrypting-database-details-at-rest.html
WARNING: abs already refers to: #'clojure.core/abs in namespace: taoensso.encore, being replaced by: #'taoensso.encore/abs
WARNING: abs already refers to: #'clojure.core/abs in namespace: kixi.stats.math, being replaced by: #'kixi.stats.math/abs
WARNING: abs already refers to: #'clojure.core/abs in namespace: kixi.stats.test, being replaced by: #'kixi.stats.math/abs
WARNING: abs already refers to: #'clojure.core/abs in namespace: kixi.stats.distribution, being replaced by: #'kixi.stats.math/abs
msgcat: msgid '{0} metric' is used without plural and with plural.
msgcat: msgid '{0} table' is used without plural and with plural.
```

I'm not sure what the last two lines are about but I suspect they are
preexisting conditions. their form from the final combined pot
file (althrough again with leading commas on the filenames to prevent
them from being omitted from the commit message)

```
,#: frontend/src/metabase/admin/permissions/components/PermissionsConfirm.jsx:32
,#: src/metabase/automagic_dashboards/core.clj
,#: target/classes/metabase/automagic_dashboards/core.clj
,#, fuzzy, javascript-format
msgid "{0} table"
msgid_plural "{0} tables"
msgstr[0] ""
"#-#-#-#-#  metabase-frontend.pot  #-#-#-#-#\n"
"#-#-#-#-#  metabase-backend.pot (metabase)  #-#-#-#-#\n"
msgstr[1] "#-#-#-#-#  metabase-frontend.pot  #-#-#-#-#\n"

...

,#: frontend/src/metabase/query_builder/components/view/QuestionDescription.jsx:24
,#: src/metabase/automagic_dashboards/core.clj
,#: target/classes/metabase/automagic_dashboards/core.clj
,#, fuzzy, javascript-format
msgid "{0} metric"
msgid_plural "{0} metrics"
msgstr[0] ""
"#-#-#-#-#  metabase-frontend.pot  #-#-#-#-#\n"
"#-#-#-#-#  metabase-backend.pot (metabase)  #-#-#-#-#\n"
msgstr[1] "#-#-#-#-#  metabase-frontend.pot  #-#-#-#-#\n"
```

* Add drivers, one override, remove unused import

import wasn't necessary
forgot to check the driver sources for i18n
for some reason grasp doesn't descend into

```clojure
(defmacro ^:private deffingerprinter
  [field-type transducer]
  {:pre [(keyword? field-type)]}
  (let [field-type [field-type :Semantic/* :Relation/*]]
    `(defmethod fingerprinter ~field-type
       [field#]
       (with-error-handling
         (with-global-fingerprinter
           (redux/post-complete
            ~transducer
            (fn [fingerprint#]
              {:type {~(first field-type) fingerprint#}})))
         (trs "Error generating fingerprint for {0}" (sync-util/name-for-logging field#))))))
```

I've opened an issue on [grasp](https://github.com/borkdude/grasp/issues/28)

* Use vars rather than name based matching
parent 2b973c79
No related branches found
No related tags found
No related merge requests found
......@@ -4,6 +4,7 @@
{common/common {:local/root "../common"}
cheshire/cheshire {:mvn/version "5.8.1"}
clj-http/clj-http {:mvn/version "3.9.1"}
io.github.borkdude/grasp {:mvn/version "0.0.3"}
org.fedorahosted.tennera/jgettext {:mvn/version "0.15.1"}}
:aliases
......
(ns i18n.enumerate
"Enumerate and create pot file from the backend worktree of metabase."
(:require
[clojure.java.io :as io]
[clojure.spec.alpha :as s]
[clojure.string :as str]
[grasp.api :as g]
[metabuild-common.core :as u])
(:import [org.fedorahosted.tennera.jgettext
Catalog HeaderFields HeaderFields Message PoWriter]))
(set! *warn-on-reflection* true)
(def ^:private roots (into [] (map (partial str u/project-root-directory))
["/src" "/shared/src" "/enterprise/backend/src"
"/modules/drivers/bigquery-cloud-sdk/src"
"/modules/drivers/druid/src"
"/modules/drivers/google/src"
"/modules/drivers/googleanalytics/src"
"/modules/drivers/mongo/src"
"/modules/drivers/oracle/src"
"/modules/drivers/presto/src"
"/modules/drivers/presto-common/src"
"/modules/drivers/presto-jdbc/src"
"/modules/drivers/redshift/src"
"/modules/drivers/snowflake/src"
"/modules/drivers/sparksql/src"
"/modules/drivers/sqlite/src"
"/modules/drivers/sqlserver/src"
"/modules/drivers/vertica/src"]))
(def overrides
(into []
(map (fn [override]
(update override :file (partial str u/project-root-directory))))
;; doesn't find the usage in fingerprinters, which is a macro emitting a defmethod
[{:file "/src/metabase/sync/analyze/fingerprint/fingerprinters.clj"
:message "Error generating fingerprint for {0}"}]))
(defn- strip-roots
[path]
(str/replace path
(re-pattern (str/join "|" (map #(str "file:" % "/") roots)))
""))
(def translation-vars
"Vars that are looked for for translations strings"
#{'metabase.util.i18n/trs
'metabase.util.i18n/tru
'metabase.util.i18n/deferred-trs
'metabase.util.i18n/deferred-tru
'metabase.shared.util.i18n/tru})
(s/def ::translate (s/and
(complement vector?)
(s/cat :translate-symbol (fn [x]
(and (symbol? x)
(translation-vars (g/resolve-symbol x))))
:args (s/+ any?))))
(defn- form->string-for-translation
"Function that turns a form into the translation string. At the moment
it is just the second arg of the form. Afterwards it will need to
concat string literals in a `(str \"foo\" \"bar\")` situation. "
[form]
(second form))
(defn- analyze-translations
[roots]
(map (fn [result]
(let [{:keys [line _col uri]} (meta result)]
{:file (strip-roots uri)
:line line
:message (form->string-for-translation result)}))
(g/grasp roots ::translate)))
(defn- group-results-by-filename
"Want all filenames collapsed into a list for each form"
[results]
(->> results
(concat overrides)
(sort-by :file)
(group-by :message)
(sort-by (comp :file first val))
(map (fn [[string originals]]
{:message string :files (map #(select-keys % [:file :line]) originals)}))))
(defn- usage->Message
^Message [{:keys [message files]}]
(let [msg (Message.)]
(.setMsgid msg message)
(doseq [file files]
(if-let [line (:line file)]
(.addSourceReference msg (:file file) line)
(.addSourceReference msg (:file file))))
msg))
(defn- header
"Headers are just another message. Create one with our info."
^Message []
(let [hv (HeaderFields.)
now (.format (java.text.SimpleDateFormat. "yyyy-MM-dd HH:mmZ")
(java.util.Date.))]
(doseq [[prop value] [[HeaderFields/KEY_ProjectIdVersion "1.0"]
[HeaderFields/KEY_ReportMsgidBugsTo "docs@metabase.com"]
[HeaderFields/KEY_PotCreationDate now]
[HeaderFields/KEY_MimeVersion "1.0"]
[HeaderFields/KEY_ContentType "text/plain; charset=UTF-8"]
[HeaderFields/KEY_ContentTransferEncoding "8bit"]]]
(.setValue hv prop value))
(let [message (.unwrap hv)]
(doseq [comment ["Copyright (C) 2022 Metabase <docs@metabase.com>"
"This file is distributed under the same license as the Metabase package"]]
(.addComment message comment))
message)))
(defn processed->catalog
"Takes the grouped usages and returns a catalog."
^Catalog [usages]
(let [header (header)
catalog (Catalog. true #_is-pot)]
(.addMessage catalog header)
(doseq [usage usages]
(.addMessage catalog (usage->Message usage)))
catalog))
(defn- create-pot-file!
[sources filename]
(let [analyzed-usages (group-results-by-filename (analyze-translations sources))]
(when-let [not-strings (seq (remove (comp string? :message) analyzed-usages))]
(println "Bad analysis: ")
(run! (comp println pr-str) not-strings))
(with-open [writer (io/writer filename)]
(let [po-writer (PoWriter.)
catalog (processed->catalog (filter (comp string? :message) analyzed-usages))]
(.write po-writer catalog writer)))
(println "Created pot file at " filename)))
(defn -main
"Entrypoint for creating a backend pot file."
[& [filename]]
(when (str/blank? filename)
(println "Please provide a filename argument. Eg: ")
(println " clj -M -m i18n.enumerate \"$POT_BACKEND_NAME\"")
(println " clj -M -m i18n.enumerate metabase.pot")
(System/exit 1))
(create-pot-file! roots filename))
(comment
(take 4 (analyze-translations roots))
(def single-file (str u/project-root-directory "/src/metabase/driver.clj"))
(preprocess-results (analyze-translations single-file))
(create-pot-file! single-file "pot.pot")
(map (juxt meta identity)
(g/grasp single-file ::translate))
)
......@@ -45,31 +45,10 @@ rm "$POT_FRONTEND_NAME.bak"
# update backend pot #
######################
# xgettext before 0.19 does not understand --add-location=file. Even CentOS
# 7 ships with an older gettext. We will therefore generate full location
# info on those systems, and only file names where xgettext supports it
LOC_OPT=$(xgettext --add-location=file -f - </dev/null >/dev/null 2>&1 && echo --add-location=file || echo --add-location)
find . -name "*.clj" | xgettext \
--from-code=UTF-8 \
--language=lisp \
--copyright-holder='Metabase <docs@metabase.com>' \
--package-name="metabase" \
--msgid-bugs-address="docs@metabase.com" \
-k \
-kmark:1 -ki18n/mark:1 \
-ktrs:1 -ki18n/trs:1 \
-ktru:1 -ki18n/tru:1 \
-kdeferred-trs:1 -ki18n/deferred-trs:1 \
-kdeferred-tru:1 -ki18n/deferred-tru:1 \
-ktrun:1,2 -ki18n/trun:1,2 \
-ktrsn:1,2 -ki18n/trsn:1,2 \
$LOC_OPT \
--add-comments --sort-by-file \
-o $POT_BACKEND_NAME -f -
sed -i".bak" 's/charset=CHARSET/charset=UTF-8/' "$POT_BACKEND_NAME"
rm "$POT_BACKEND_NAME.bak"
pushd bin/i18n
clojure -M -m i18n.enumerate "../../$POT_BACKEND_NAME"
# switch back to project root
popd
########################
# update auto dash pot #
......
......@@ -350,8 +350,8 @@
(into #{} (keys acc)))]
(if (empty? cyclic-props)
(recur transitive-props next-acc)
(-> "Cycle detected resolving dependent visible-if properties for driver {0}: {1}"
(trs driver cyclic-props)
(-> (trs "Cycle detected resolving dependent visible-if properties for driver {0}: {1}"
driver cyclic-props)
(ex-info {:type qp.error-type/driver
:driver driver
:cyclic-visible-ifs cyclic-props})
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment