Skip to content
Snippets Groups Projects
Commit f52f048f authored by Cam Saul's avatar Cam Saul
Browse files

Probably don't need to consider more than 10,000 rows for field-percent-urls and field-avg-length

parent f8f65f93
No related branches found
No related tags found
No related merge requests found
......@@ -271,7 +271,9 @@
(field-percent-urls [this field]
(assert (extends? ISyncDriverFieldValues (class this))
"A sync driver implementation that doesn't implement ISyncDriverFieldPercentURLs must implement ISyncDriverFieldValues.")
(let [field-values (field-values-lazy-seq this field)]
(let [field-values (->> (field-values-lazy-seq this field)
(filter identity)
(take 10000))] ; Considering the first 10,000 rows is probably fine; don't want to have to do a full scan over millions
(percent-valid-urls field-values))))
(defn mark-url-field!
......@@ -318,10 +320,13 @@
(field-avg-length [this field]
(assert (extends? ISyncDriverFieldValues (class this))
"A sync driver implementation that doesn't implement ISyncDriverFieldAvgLength must implement ISyncDriverFieldValues.")
(let [field-values (field-values-lazy-seq this field)
(let [field-values (->> (field-values-lazy-seq this field)
(filter identity)
(take 10000)) ; as with field-percent-urls it's probably fine to consider the first 10,000 values rather than potentially millions
field-values-count (count field-values)]
(if (= field-values-count 0) 0
(int (math/round (/ (->> field-values
(map str)
(map count)
(reduce +))
field-values-count)))))))
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment