1
Effects
of
Prenatal
Mercury
Exposure
on
Childhood
IQ:
A
Synthesis
of
Three
Studies
Louise
M.
Ryan
Professor
of
Biostatistics
Harvard
School
of
Public
Health
Report
to
the
U.
S.
Environmental
Protection
Agency
March
2005
We
combine
data
from
three
studies
in
order
to
estimate
the
effects
of
prenatal
methylmercury
on
IQ.
The
analysis
uses
a
hierarchical
model
similar
to
the
one
described
by
the
National
Academy
of
Sciences
Committee
on
the
Toxicological
Effects
of
Methylmercury
(
NRC
2000).

A
more
technical
description
of
these
same
methods
has
been
provided
by
Coull,
et
al.
(
2004).

One
important
difference
between
the
NAS
and
our
proposed
analysis
is
that
the
former
emphasized
Benchmark
Dose
calculations,
while
our
analysis
focuses
on
dose
response.

Another
difference
is
that
our
analysis
allows
for
some
endpoints
to
be
common
to
2
or
3
of
the
studies.

The
Studies
We
considered
data
from
the
three
studies
identified
by
the
NAS
as
being
of
adequate
quality
to
be
seriously
considered
as
basis
for
a
risk
assessment
of
the
health
effects
of
methylmercury
2
exposure
(
see
NRC,
2000,
Table
5­
10).
While
Bellinger
(
2005)
describes
the
studies
in
detail,

we
provide
brief
descriptions
below.

The
Seychelles
Child
Development
Study
involved
a
large
cohort
study
involving
779
children
enrolled
prenatally.
Methylmercury
levels,
measured
in
maternal
hair
samples
gathered
at
the
end
of
pregnancy,
had
a
median
of
5.9
parts
per
million
(
ppm),
due
to
the
high
rate
of
fish
consumption
in
the
typical
Seychelles
diet.
Results
of
a
variety
of
different
neurodevelopmental
assessments
have
been
reported
on
the
enrolled
children
at
6,
19,
29,
and
66
months
of
age
(
see
for
example
Myers
et
al.
1995).
Some
of
these
outcomes
were
the
ones
reported
by
NRC
in
their
2000
analysis.
In
the
present
analysis,
we
use
the
outcomes
measured
when
the
children
were
around
age
9
years
(
Myers
et
al.
2003).
The
Seychelles
study
had
a
careful
epidemiological
design
that
allowed
adjustment
for
a
variety
of
potentially
important
confounders,
including
maternal
age,
medical
history,
socioeconomic
status,
smoking
and
drinking
behaviors.
Data
available
for
our
analysis
include
regression
coefficients
associated
with
methylmercury
exposure
for
the
various
outcomes
of
interest,
after
adjustment
for
the
possible
confounding
effects
of
these
other
variables.

Like
the
Seychelles
study,
the
Faroe
Islands
study
was
a
well
designed
cohort
study
conducted
in
a
setting
with
high
fish
consumption.
Maternal
levels
of
methylmercury
in
the
study
population
had
a
geometric
mean
of
4.3
ppm
in
hair.
The
original
Faroe
Islands
cohort
involved
1,022
mother­
infant
pairs
and
results
on
a
variety
of
neurodevelopmental
outcomes
have
been
reported
(
see
Grandjean
et
al.
1997).
We
will
focus
on
outcomes
measured
when
the
children
were
aged
3
7
years.
Available
data
include
covariate
adjusted
regression
coefficients
associated
with
methylmercury
exposure.

The
New
Zealand
study
involved
a
sample
of
237
mother/
infant
pairs,
identified
as
high
fish
consuming
women
from
a
total
cohort
of
11,000.
We
will
analyze
data
collected
on
the
children
when
they
were
6
years
old,
as
reported
by
Crump
et
al.
(
1998).

Endpoints
and
data
to
be
analyzed
Although
our
primary
focus
is
full
scale
IQ,
we
will
present
a
number
of
different
analyses
that
also
include
other
endpoints.
Our
logic
is
that
by
including
other
endpoints,
our
analysis
can
help
to
reduce
some
of
the
noise
associated
with
an
analysis
that
uses
just
the
IQ
data
alone.
For
example,
to
the
extent
that
IQ
is
correlated
with
other
endpoints
(
e.
g.
outcome
from
the
California
Verbal
Learning
Test),
including
these
other
endpoints
in
the
analysis
can
serve
to
bolster
estimates
of
the
dose
response
effect
associated
with
IQ
itself.
We
used
the
following
process
to
select
the
endpoints
to
be
included
in
our
analysis.

Step
1:
We
identified
the
primary
publications
that
had
reported
on
the
three
studies.
These
were
Myers
et
al.
(
2003)
for
the
Seychelles
study,
Grandjean
et
al.
(
1997)
for
the
Faroes
study,

and
Crump
et
al.
(
1998)
and
a
report
from
the
Swedish
Environmental
Protection
Agency
(
Kjellstrom
et
al.
1989)
for
the
New
Zealand
study.
All
endpoints
reported
in
these
papers
and
related
to
when
the
children
were
between
6
and
9
years
of
age
were
considered
as
potentially
4
included
in
our
analysis.
In
the
case
of
the
Seychelles
and
New
Zealand
studies,
all
necessary
information
could
be
extracted
from
the
published
papers
(
Tables
2,
3
and
4
of
Myers
et
al.,

2003,
and
Table
III
of
Crump
et
al.
1998).
The
Faroes
studies,
however,
had
only
reported
their
results
with
cord
blood
mercury
transformed
to
the
log­
scale
(
see
Table
4
of
Grandjean
et
al.

1997).
Hence,
we
asked
the
Faroes
investigators
to
provide
the
additional
details
needed
for
our
analysis
(
see
Budtz­
Jorgensen
et
al.
2005).
Table
2
of
Budtz­
Jorgensen
et
al.
2005
shows
the
estimated
dose
response
coefficients
for
the
various
endpoints
reported
in
Grandjean
et
al.

(
1997),
but
on
the
untransformed
rather
than
the
log
scale
of
mercury.
In
order
to
place
the
estimated
regression
coefficients
on
a
comparable
scale,
Table
2
of
Budtz­
Jorgensen
et
al.
2005
uses
the
response
standard
deviations
(
not
shown
in
the
table)
to
standardize
the
coefficients
and
standard
errors,
expressing
them
in
terms
of
a
percent
of
the
response
standard
deviation.

Consider
finger
tapping,
for
example.
The
estimated
regression
coefficient
of
­.
193
shown
in
column
2
is
divided
by
the
observed
standard
deviation
(
not
shown)
of
6.12
and
multiplied
by
100
to
obtain
the
standardized
value
of
­
3.15.
Note
that
the
standardization
is
not
done
with
respect
to
the
population
standard
deviation
of
Finger
Tapping,
which
is
5.
While
standardization
could
have
been
done
using
population
standard
deviations
instead,
we
felt
that
using
the
study
standard
deviations
allowed
more
appropriately
for
study­
to­
study
variation.

Population
standard
deviations
are
based
on
a
much
more
diverse
group
of
children
and
hence,

arguably,
are
less
appropriate
for
standardizing
our
estimated
dose
response
coefficients.

Furthermore,
population
and
study
standard
deviations
were
generally
fairly
similar,
hence
it
is
unlikely
that
a
different
approach
to
standardization
would
result
in
appreciably
different
results.

Step
2:
From
the
set
of
potential
endpoints
identified
in
Step
1,
we
selected
those
for
which
population
means
and
standard
deviations
were
either
provided
or
readily
available.
This
led
to
5
several
endpoints
measured
in
the
Seychelles
study
being
dropped
from
our
analysis.
For
some,

only
ranges
of
normal
performance
were
provided
(
the
Woodcock
Johnson
Test
of
Achievement,

the
Haptic
Discrimination
Test
and
the
CTRS
Hyperactivity
index,
Tables
2,
3
and
4,

respectively
of
Myers
et
al.
2003).
For
others,
only
portions
of
the
test
were
administered
(
the
Bruninks­
Oseretsky
Test
of
Motor
Proficiency).
We
also
dropped
the
McCarthy
Motor
Scales
endpoint
from
the
New
Zealand
analysis,
since
the
regression
coefficient
was
reported
only
for
the
log­
transformed
outcome,
for
which
there
are
no
population
standards.
Similarly,
we
dropped
Grooved
Pegboard
from
the
Seychelles
study
because
the
analysis
was
reported
for
a
transformed
value
of
this
outcome
(
see
Table
3
of
Myers
et
al.
2003).

Step
3:
For
some
tests
and
endpoints,
the
study
means
and/
or
standard
deviations
were
substantially
different
from
the
population
norms.
For
example,
while
the
population
mean
and
standard
deviation
are
74
and
15,
respectively
for
the
Grooved
Pegboard
test
reported
in
Table
3
of
Myers
et
al.
2003,
the
values
observed
in
the
study
were
91.8
and
20.5.
We
concluded
that
this
test
was
not
well
replicated
in
the
study
population
and
excluded
it
from
our
analysis.

Similarly,
one
of
the
trail­
making
tests
in
the
Seychelles
study
had
a
study
mean
and
standard
deviation
of
81.5
and
49.6,
compared
to
expected
population
values
of
55
and
19.

Step
4:
For
several
tests
and
endpoints,
results
for
multiple
variations
were
reported.
For
example,
the
Seychelles
study
presents
regression
coefficients
describing
the
effect
of
mercury
on
the
Boston
Naming
Test,
with
and
without
cues.
The
Faroes
investigators
report
three
of
the
scores
generated
on
the
California
Verbal
Learning
Test
(
Learning,
Short­
term
reproduction,

Long­
term
reproduction
and
Recognition),
while
the
Seychelles
investigators
report
two
(
Short­
6
and
Long­
term).
In
order
to
avoid
over­
representing
any
particular
endpoint
and
also
to
avoid
adding
additional
complexity
to
our
modeling
(
associated
with
having
to
account
for
correlation
between
closely
related
scores
from
the
same
test),
we
chose
only
one
score
for
a
test
in
such
cases.
When
a
test
was
shared
between
two
of
more
studies,
we
chose
a
common
score,
if
possible.
In
other
cases,
we
chose
the
score
that
is
generally
considered
to
be
the
primary
measure
for
a
test.
For
example,
we
used
the
dominant
hand
results
for
Finger
Tapping.

Applying
this
exclusion
criteria
also
led
to
dropping
CPT
attentiveness
and
Risk­
taking
from
the
Seychelles
study
(
CPT
Reaction
Time
was
common
to
both
Seychelles
and
Faroes).

Step
5.
Once
our
final
set
of
endpoints
had
been
selected,
we
grouped
them
into
three
broad
domains:
Achievement/
Cognition,
Attention/
Behavior
and
Motor
Skills.
For
the
most
part,
we
followed
the
categories
suggested
by
the
Seychelles
authors,
specifically
Myers
et
al.
(
2003)

Table
2
for
Achievement/
Cognition,
Table
3
for
Attention/
Behavior
and
Table
4
for
Motor
Skills.
Two
exceptions
were
the
WRAML
and
VMI
tests,
which
though
listed
in
Table
3
of
Myers
et
al.
were
included
in
our
Achievement/
Cognition
domain,
since
they
relate
to
visual
memory
and
visual­
motor
integration,
respectively.
Four
of
the
endpoints
measured
in
the
Faroes
study
fell
naturally
into
the
Achievement/
Cognition
domain
(
BNT
and
CVLT
both
being
listed
in
Table
2
of
Myers
et
al.
2003,
the
Bender
Visual
Motor
Gestalt
test
being
an
established
measure
of
cognitive
achievement,
and
of
course,
IQ).
All
four
included
New
Zealand
endpoints
fell
under
the
cognition/
achievement
domain.

Table
1
summarizes
the
various
endpoints
to
be
included
in
our
analysis,
along
with
their
grouping
into
the
three
domains
described
above.
We
will
present
three
different
types
of
7
analysis,
one
(
Analysis
A)
based
on
only
the
IQ
endpoints
from
the
three
studies,
one
(
Analysis
B)
that
uses
all
the
Cognition/
Achievement
related
endpoints
and
finally
one
(
Analysis
C)
based
on
all
available
endpoints.

Full
scale
IQ
was
measured
in
both
the
Seychelles
and
New
Zealand
study,
although
slightly
different
versions
of
the
Wechsler
Intelligence
Scale
for
Children
were
used
(
WISC­
R
for
New
Zealand
and
WISC­
III
for
Seychelles).
However,
the
Faroes
study
does
not
report
full
scale
IQ,

but
rather
3
subscales
of
the
WISC­
R:
Digits
Spans,
Similarities
and
Block
Designs.
Table
2
shows
the
estimated
dose
response
coefficients
and
their
standard
errors
for
the
various
endpoints,
extracted
from
Table
2
of
Budtz­
Jorgensen
et
al.
2005.
Estimated
coefficients
are
reported
in
both
their
original
scale,
as
well
as
on
a
standardized
scale
reflecting
the
impact
of
exposure
in
terms
of
percentage
of
the
outcome
standard
deviation.
For
example,
the
estimated
dose
response
coefficient
for
Digit
Spans
(­.
025)
is
converted
to
the
standardized
coefficient
of
­

1.72
by
dividing
by
the
standard
deviation
of
1.45
and
then
multiplying
by
100.
The
Faroes
researchers
report
three
different
analyses
for
Similarities,
since
there
were
strong
examiner
effects.
The
standardized
regression
coefficient
for
Similarities
was
­
1.01
when
the
analysis
was
based
on
all
the
data,
but
­
5.54
and
0.07
respectively
when
stratified
by
examiner.
In
Table
2,

we
use
the
value
(­
1.01)
that
combines
results
from
both
examiners,
making
use
of
the
entire
data
set.
Similarly,
there
were
two
reports
for
Block
Designs,
one
based
on
the
standard
outcome,
the
other
based
on
a
square
root
transformation,
the
authors
arguing
that
a
square
root
transformation
was
needed
in
order
to
make
this
endpoint
more
normally
distributed.
Fortunately,
the
standardized
regression
coefficients
for
both
the
standard
and
square­
root
transformed
Block
Designs
analyses
were
essentially
identical
(­
1.94
and
­
1.96).
It
is
interesting
to
see
that
8
although
the
regression
coefficients
estimates
on
the
original
scale
look
very
different,
they
look
quite
comparable
when
re­
expressed
on
the
standardized
scale.

We
asked
the
Faroes
investigators
to
fit
a
Structural
Equations
Model
(
SEM)
in
order
to
combine
these
three
subscales
so
as
to
provide
a
more
formal
estimate
of
a
standardized
coefficient
for
full­
scale
IQ.
Structural
equation
modeling
is
a
powerful
analysis
tool
that
allows
the
combination
of
multiple
exposure
and
responses
via
the
use
of
latent
variables.
Budtz­
Jorgensen
et
al.
(
2002)
have
published
an
analysis
of
the
Faroes
data
using
such
an
approach,
where
three
different
exposure
measures
(
hair,
blood
and
whale
consumption)
were
viewed
as
surrogates
for
methylmercury
exposure
and
the
various
outcomes
from
the
study
were
viewed
as
manifestations
of
two
latent
outcomes,
one
corresponding
to
motor
ability
and
the
other
to
verbal
ability.
The
model
for
that
analysis
is
summarized
in
Figure
1.
Figure
2
shows
the
much
simpler
SEM
model
used
to
combine
the
three
IQ
subscales
for
our
purposes.

When
fitting
an
SEM,
it
is
necessary
to
make
some
arbitrary
decision
about
the
scaling
of
any
latent
variables
involved
in
the
model.
Common
practice
is
to
put
the
latent
variable
on
the
same
scale
as
one
of
the
input
variables.
This
decision
has
no
impact
on
the
overall
fit
of
the
model,

but
can
be
very
helpful
in
terms
of
model
interpretability.
Budtz­
Jorgensen
et
al.
(
2005)

assumed
that
the
latent
variable
was
on
the
same
scale
as
Digits
Spans.
The
analysis
led
to
an
estimated
dose
response
coefficient
of
­.
024
and
a
standard
error
of
.011
for
the
effect
of
cord
blood
mercury
on
latent
full
scale
IQ,
with
corresponding
p­
value
of
0.031.
There
are
two
possible
ways
to
rescale
this
estimated
coefficient
so
that
it
is
comparable
to
full
scale
IQ.
One
approach
is
to
use
the
same
scaling
factor
as
used
for
digits
spans
(
since
the
latent
variable
is
assumed
to
be
on
the
same
scale
as
digit
spans).
This
led
to
a
standardized
value
of
­
1.65.
By
9
using
the
same
scaling
factor
as
digit
spans,
we
are
implicitly
assuming
that
the
latent
IQ
variable
has
the
same
standard
deviation
as
digit
spans.
Another
approach
is
to
use
the
estimated
standard
deviation
of
the
latent
variable
itself,
obtained
as
part
of
the
SEM
fitting
procedure.
Taking
this
approach
led
to
a
rescaled
regression
coefficient
of
­
4.09.
We
will
consider
both
approaches,
since
each
has
some
good
arguments
to
support
it.
A
justification
for
choosing
the
value
of
­
1.65
is
that
this
value
is
quite
close
to
the
average
of
the
standardized
coefficients
corresponding
to
the
three
separate
subscales
and
this
would
be
a
reasonable
choice
in
and
of
itself.
A
justification
for
choosing
the
value
of
­
4.09
is
that
it
is
consistent
with
the
idea
that
an
analysis
based
on
full­
scale
IQ
should
have
a
higher
signal
to
noise
ratio
(
in
other
words,
a
lower
coefficient
of
variation)
than
analyses
based
on
single
subscales
from
the
full
scale
IQ
test.
Furthermore,
the
estimated
correlations
between
the
subscales
(.
40,
.49
and
.67,

respectively
for
Digits
Spans,
Similarities
and
Block
Designs)
and
the
latent
IQ
variable
reported
in
Table
6
of
Budtz­
Jorgensen
et
al.
(
2005)
are
very
close
to
the
values
that
have
been
estimated
in
other
population­
based
studies
of
the
relationship
between
the
WISC­
III
full­
scale
IQ
and
the
various
subscales
for
7
year
old
children:
(.
43,
.66
and
.57,
respectively
for
Digits
Spans,

Similarities
and
Block
Designs.
(
Wechsler
1991)).
In
our
subsequent
analysis,
we
will
present
results
using
both
choices
for
the
rescaling
the
estimated
dose
response
coefficient
on
full­
scale
IQ.
We
focus
first
on
results
based
on
the
value
of
­
1.65,
since
this
leads
to
dose­
response
estimates
that
are
closer
to
zero.
Results
based
on
using
the
value
of
­
4.09
are
presented
later.

The
relative
advantages
and
disadvantages
of
these
two
choices
are
expanded
upon
in
the
Discussion
section
of
the
report.
10
Analysis
B
will
include
all
outcomes
related
to
achievement
(
see
Table
2
in
Myers
et
al.
2003
and
Table
I
in
Crump
et
al.
for
discussion
about
this
domain
grouping).
This
analysis
will
include
verbal
learning,
visual
memory,
and
visual­
motor
integration,
and
the
various
IQ
subscales,
as
well
as
the
McCarthy
perceptual
performance
scale
(
New
Zealand
study).
Our
initial
plan
had
been
to
conduct
a
separate
analysis
including
only
those
tests
related
to
Attention/
Behavior
(
Table
4
of
Myers
et
al.
2003).
However,
after
applying
the
various
exclusion
criteria
presented
in
the
previous
section,
there
were
only
3
tests
that
fell
into
this
category:
CPT
reaction
time
from
both
the
Faroes
and
Seychelles
studies,
as
well
as
CBCL
from
the
Seychelles
study.
Although
the
Everts
Behavioural
Rating
Scale
was
administered
in
the
New
Zealand
study,
the
investigators
did
not
include
it
in
their
regression
analysis
because,

" 
no
standardized
combined
score
[
for
it]
has
been
developed"(
Kjellstrom
et
al.
1989,
p.
61).

With
such
limited
data,
it
will
not
be
possible
to
conduct
a
formal
analysis.
Similarly,
it
was
not
possible
to
perform
an
analysis
specific
to
Motor
Skills
since
after
applying
the
various
exclusion
criteria
described
earlier,
there
were
only
three
tests
remaining
in
this
domain:
Finger
tapping
and
Hand­
eye
coordination
for
the
Faroes
study
and
Finger
Tapping
for
the
Seychelles
study.

We
do,
however,
present
some
analysis
that
combines
all
the
available
endpoints
from
all
three
domains.
Because
our
main
focus
is
on
the
Cognition
and
Achievement
domain,
we
view
this
only
as
a
secondary
analysis,
useful
as
an
adjunct
to
help
us
explore
study­
to­
study
and
outcomespecific
sources
of
variation.

As
discussed
presently,
our
analysis
will
be
based
on
a
combination
of
hierarchical
modeling
techniques
and
graphical
representations
of
the
data.
Our
approach
allows
us
to
borrow
strength
across
studies
(
in
terms
of
characterizing
exposure/
response
relationships)
while
11
appropriately
allowing
for
study­
to­
study
variability
and
allowing
for
clustering
of
effects
within
domain.

Tables
3,
4
and
5
shows
the
estimated
regression
coefficients
and
corresponding
standard
errors
for
the
endpoints
reflecting
the
three
domains
of
interest
(
Cognition/
Achievement,

Attention/
Behavior
and
Motor
Skills,
respectively)
that
we
will
be
considering
from
the
three
studies.
For
the
Faroes
and
New
Zealand
studies,
two
sets
of
estimates
are
presented,
one
based
on
the
full
datasets
available
for
each
of
the
studies,
the
other
for
a
subset
with
some
observations
excluded.
In
the
case
of
the
Seychelles,
the
only
available
estimates
were
those
based
on
excluding
observations
considered
to
be
outliers
(
see
Myers
et
al.
2003),
so
for
this
reason,
we
report
the
Seychelles
results
under
the
column
"
subset".
For
the
Faroes
study,

regression
coefficients
were
estimated
both
for
the
full
cohort,
as
well
as
for
the
subset
of
776
individuals
with
hair
methylmercury
below
10
ppm
(
or
micrograms
per
gram,
ug/
g).
We
include
these
data
for
completeness,
but
do
not
consider
them
in
any
detail
for
our
subsequent
analysis.

For
the
New
Zealand
study,
results
were
available
for
the
full
cohort
as
well
as
when
one
very
extreme
observation
was
excluded
(
see
Crump
et
al.
1998,
Table
III).
Our
primary
analysis
will
be
conducted
using
the
datasets
recommended
by
the
various
study
investigators
in
their
published
studies,
where
possible,
or
using
the
recommendations
from
the
National
Academy.
In
the
case
of
the
New
Zealand
study,
for
example,
this
primary
dataset
corresponds
to
one
that
excludes
the
one
very
highly
exposed
child.
The
primary
analysis
reported
by
the
Faroes
investigators
was
based
on
all
available
data.
The
primary
Seychelles
analysis
excluded
"
outlier"
or
those
observations
more
with
model
residuals
exceeding
3
standard
deviation
units.

We
will
later
report
a
sensitivity
analysis
wherein
our
primary
analyses
will
be
repeated
using
12
variations
on
the
datasets
used.
For
example,
we
will
repeat
the
analysis
adding
back
the
New
Zealand
child
with
very
high
methylmercury
levels
(
see
Crump
et
al.
1998
for
discussion).

Before
proceeding
to
a
description
of
the
analysis
methods
and
results,
we
discuss
here
the
dose
response
models
used
by
the
various
investigators.
All
three
studies
report
regression
coefficients
based
on
linear
models.
These
linear
models
were
arrived
at
as
a
special
limiting
case
of
the
K­
power
model,
which
ignoring
covariates
for
now,
corresponds
to
fitting
Y
=

 0+
 1XK+
e
where
Y
is
the
outcome
of
interest
and
X
is
exposure,
e
is
an
error
term
and
K
is
an
additional
parameter
to
be
estimated.
The
K­
power
model
has
been
popular
among
risk
assessors
because
of
its
flexibility
in
accommodating
a
wide
range
of
shapes
of
dose
response.

For
example,
Figure
3
shows
three
very
different
shaped
models
obtained
by
varying
the
value
of
K
from
0
(
limiting
value
corresponding
to
a
log
transformation
of
exposure),
K=
1
(
linear
model).

To
K=
2
(
quadratic
function
of
dose).
For
binary
outcomes,
it
has
been
traditional
for
K
to
be
restricted
to
values
of
1
or
greater
so
as
to
ensure
a
sub­
linear
or
linear
dose
response.
However,

this
argument
does
not
apply
for
continuous
outcomes
of
the
kind
encountered
in
our
analysis.

Indeed,
Budtz­
Jorgensen
et
al.
(
2000)
found
that
slightly
better
fits
(
according
to
standard
statistical
tools
for
goodness
of
fit)
were
obtained
using
a
log­
transformation
of
exposure.
A
subset
of
endpoints
had
been
re­
analyzed
using
the
K­
power
model
as
part
of
the
preparing
the
NAS
methylmercury
report.
For
our
analysis,
we
have
chosen
to
follow
the
NAS
committee's
decision
to
use
results
based
on
a
linear
model.
The
NAS
gave
several
reasons
for
doing
so.

First
of
all,
when
K
was
restricted
to
value
of
1
or
greater,
the
best
fitting
model
was
obtained
by
setting
K=
1.
While
there
was
some
evidence
that
a
log
transformation
might
provide
a
slightly
13
better
fit,
examination
of
the
fitted
model
suggested
that
the
model
choice
was
being
dominated
by
a
few
relatively
high
exposure
levels
and
that
there
were
relatively
few
observations
at
the
very
low
end
of
exposure.
Figure
4
illustrates
this
idea
with
the
fitted
model
for
the
Boston
Naming
Test
from
the
Faroe
Islands
analysis.
The
figure
was
taken
from
the
NAS
report,
though
the
x­
axis
was
rescaled
to
reflect
hair
mercury
rather
than
cord
blood
mercury
(
as
was
done
in
the
original
figure).
Note
that
the
x­
axis
is
also
drawn
on
the
log
scale.
The
relatively
low
density
of
points
below,
say
.7
ppm
in
hair
means
that
the
predicted
mean
curve
is
quite
sensitive
to
the
influence
of
extreme
observations
at
the
high
end
of
the
scale.
For
example,
the
somewhat
better
fit
for
the
log
model
reported
by
the
Faroes
group
may
well
be
due
to
a
small
number
of
relatively
highly
exposed
individuals.
However,
as
indicated
in
the
Figure,
while
the
log
model
provides
a
similar
fit
to
that
obtained
with
the
unlogged
data
for
the
central
bulk
of
datapoints,
it
leads
to
quite
a
different
shape
in
the
low­
dose
region
where
there
is
little
data
to
inform
the
model.
It
is
this
sensitivity
that
explains,
for
example,
why
Benchmark
Dose
estimates
based
on
the
log­
transformed
exposure
variable
were
quite
low
relative
to
those
based
on
linear
and
even
square
root
models
(
see
Table
1
of
Budtz­
Jorgensen
et
al.
2001).
The
NAS
report
suggested
considerable
caution
regarding
the
use
of
the
log­
transformed
model.

Statistical
Methods
Rescaling
The
first
step
in
our
analysis
was
to
rescale
all
the
estimated
regression
coefficients
and
standard
errors
so
that
they
corresponded
to
the
same
scale
as
full
scale
IQ
(
that
is,
a
population
mean
of
14
100
and
standard
deviation
of
15).
Ideally,
we
would
like
to
have
our
results
expressed
in
terms
of
the
decrement
in
IQ
associated
with
a
1
unit
increase
in
methylmercury.
Since
the
analysis
was
done
using
linear
models,
this
is
easily
achieved
with
a
simple
linear
rescaling
of
the
estimated
regression
coefficients
and
standard
deviations.
To
see
how
this
works,
consider
a
simple
linear
model,
Y
=
 0+
 1X+
e,
where
Y
is
the
outcome
of
interest,
X
is
the
covariate
of
interest
and
e
is
an
error
term.
The
regression
coefficient
 1
corresponds
to
the
expected
change
in
the
outcome
Y
associated
with
a
1
unit
change
in
the
covariate
X
(
in
our
case,
ug/
g
of
methylmercury
in
hair).
Suppose
that
Y
has
a
standard
deviation
of
 ,
and
consider
the
rescaled
model
obtained
by
multiplying
the
regression
equation
by
15/
 :
Y*=
 0
*+
 1
*
X+
e*,
where
Y*

=
15Y/
 
,
 0
*
=
15 0/
 ,
 1
*=
15 1/
 
and
e*
is
the
rescaled
error
term.
Since
Y*
has
been
rescaled
to
have
standard
deviation
15,
the
regression
coefficient
 1
*
can
be
interpreted
as
having
been
rescaled
to
apply
to
an
endpoint
on
the
same
scale
as
IQ.
Although
it
might
be
argued
that
no
rescaling
is
needed
for
any
analysis
based
directly
on
full
scale
IQ,
some
additional
rescaling
was
needed
to
reflect
the
fact
that
the
observed
standard
deviations
for
the
IQ
endpoints
were
different
from
the
expected
standard
deviation
for
IQ
of
15.
In
New
Zealand,
for
example,

observed
standard
deviations
tended
to
be
a
little
larger
than
the
established
population
standard
deviations
for
the
tests
being
used
(
see
Kjellstrom
et
al.
1989,
Tables
10
and
11),
reflecting
the
very
heterogeneous
population
in
that
study.
In
the
Seychelles,
the
observed
standard
deviations
tended
to
be
smaller
than
established
population
standard
deviations.

There
was
one
additional
complication
to
be
addressed
in
terms
of
rescaling
the
results
to
be
on
a
comparable
scale.
Two
of
the
studies
(
Seychelles
and
New
Zealand)
report
their
results
in
terms
of
methylmercury
measured
in
terms
of
ug/
g
(
ppm)
hair,
while
the
third
study
(
Faroes)
reports
in
15
terms
of
ug/
L
(
ppb)
cord
blood.
In
order
to
combine
results
across
study,
we
convert
the
Faroes
results
to
their
equivalents
in
terms
of
ppm
hair
methylmercury.
We
considered
four
different
sources
of
information
to
inform
our
assumptions
related
to
the
blood/
hair
ratio.
The
NRC
report
cited
a
ratio
of
250
to
1
(
page
306)
for
the
relationship
between
mercury
in
hair
(
ng/
g
or
ppb)
to
that
in
cord
blood
(
ug/
L,
also
in
ppb).
Analysis
of
data
from
NHANES
for
1999­
2000
found
that
hair
and
blood
mercury
measurements
had
a
Pearson
correlation
of
.79
for
women
of
childbearing
age
(
p<
.0001).
Hair­
to­
blood
mercury
ratios
were
computed
using
total
hair
mercury
(
ng/
g
hair)
and
blood
mercury
(
µ
g/
L)
data.
The
total
sample
means
(
standard
error)
hair­

toblood
mercury
ratios
were
234
(
15)
for
females
16­
49
years
(
McDowell
et
al.
2004).
The
estimate
based
on
the
NHANES
data
reflects
the
ratio
of
hair:
blood
mercury
in
the
same
individual,
rather
than
of
maternal
hair:
cord
blood
mercury,
however.
The
Faroes
study
investigators,
however,
did
collect
the
data
needed
to
estimate
this
ratio
and
report
a
median
ratio
of
approximately
200,
with
a
5­
95%
interval
of
75­
440
(
Budtz­
Jorgensen
et
al.
2004).
This
value
is
consistent
with
the
reported
geometric
averages
of
22.9
ug/
l
mercury
in
cord
blood
and
4.27
ug/
g
mercury
in
maternal
hair
(
Grandjean
et
al.
1997,
Table
2).
We
report
our
analyses
using
both
200
and
250,
finding
the
results
are
relatively
insensitive
to
the
assumed
ratio.
The
value
reported
by
Budtz­
Jorgensen
et
al.
2004
is
most
suitable
for
this
analysis,
since
it
is
specific
to
the
population
of
interest
and
is
the
only
one
of
these
values
based
on
mercury
in
cord
blood
rather
than
venous
blood.
Thus
we
use
a
ratio
of
200
in
our
primary
analysis.
While
we
consider
this
value
to
have
relatively
low
uncertainty,
we
also
examine
the
impact
of
an
alternative
ratio
of
250
in
the
sensitivity
analyses.
16
These
arguments
can
be
put
together
as
follows
to
obtain
a
scaling
factor
to
convert
the
reported
regression
coefficients
from
the
Faroes
study
to
equivalent
values
in
ppm
hair
as
follows.
Note
the
conversion
also
needs
to
account
for
the
fact
that
the
regression
coefficients
reported
in
Table
2
of
Budtz­
Jorgensen
et
al.
2005
correspond
to
the
expected
change
in
the
outcome
associated
with
a
10
unit
change
in
cord
blood
methylmercury.
We
derive
the
conversion
factor
here
for
the
case
where
the
hair
to
cord
blood
ratio
is
200,
when
both
are
measured
in
ppb,
or
.2
when
hair
is
reported
in
ppm.
Thus
we
have
that
1ppb
of
mercury
in
cord
blood
(
ug/
L)
corresponds
to
.2
ppm
mercury
in
hair
(
ug/
g).
Thus,
a
10
unit
increase
in
ug/
L
cord
blood
will
correspond
to
a
10*.
2=
2
unit
increase
in
ug/
g
of
hair.
If
a
10
unit
increase
in
ug/
L
cord
blood
results
in
a
B
unit
change
in
the
outcome,
then
this
means
that
a
2
unit
increase
in
ug/
g
hair
results
in
a
B
unit
change
in
the
outcome.
Thus,
a
1
unit
increase
in
ug/
g
hair
mercury
will
result
in
a
B/
2
change
in
the
outcome.

The
final
scale
factor
to
convert
the
reported
regression
coefficients
to
their
equivalent
effect
in
the
IQ
scale
is
achieved
by
dividing
15
(
population
standard
deviation
for
IQ)
by
2
times
the
observed
standard
deviation
of
each
outcome.
Applying
the
same
argument
using
a
hair
to
cord
blood
ratio
of
250
involves
using
2.5
instead
of
2
in
the
last
calculation.

Modeling
To
frame
our
approach,
we
begin
with
a
brief
description
of
the
NAS
hierarchical
analysis,

which
considered
that
each
of
the
three
studies
was
measuring
distinct
outcomes.
Coull
et
al.

(
2003)
provides
an
expanded
technical
discussion
of
the
NAS
modeling
and
analysis.
As
17
discussed
in
the
previous
section,
our
analysis
will
use
regression
coefficients
which
have
been
appropriately
rescaled
so
as
to
be
comparable
with
the
coefficient
associated
with
full
scale
IQ.

Let
bij
be
the
standardized
estimated
regression
coefficient
for
the
jth
outcome
within
study
i,
i=

1,
 
I
and
j=
1,
 
Ji,
where
Ji
is
the
number
of
outcomes
measured
in
the
ith
study.
Similarly,

let
the
corresponding
standard
error
be
denoted
by
sij.
The
hierarchical
model
assumes
bij
~
N(
 ij,
s2
ij)

 ij
~
N(
 i,
 
2
endpoint)

 i
~
N(
 ,
 
2
study
)

so
that
 ij
corresponds
to
the
true
standardized
dose
response
coefficient
associated
with
the
jth
outcome
in
the
ith
study,
 i
is
the
average
or
mean
dose
response
coefficient
for
the
ith
study,

and
 
is
the
overall
true
average
dose
response
coefficient.
The
variance
parameter
 
2
study
reflects
study
to
study
variability,
while
the
parameter
 
2
endpoint
reflects
the
variability
from
outcome
to
outcome,
within
the
same
study.
While
this
model
looks
complicated,
it
actually
has
a
relatively
simple
structure
and
implies
that
outcomes
measured
within
the
same
study
are
more
closely
correlated
with
each
other
than
outcomes
measured
for
different
studies.
NAS
used
a
Bayesian
approach,
fitting
the
model
using
the
statistical
package
BUGS
(
Bayesian
inference
Using
Gibbs
Sampling)
which
easily
facilitates
such
hierarchical
modeling
(
Spiegelhalter
et
al.

2000).

For
the
present
analysis,
we
will
need
to
consider
a
slightly
different
model
formulation,
since
we
will
be
considering
IQ
and
other
outcomes
that
are
common
to
all
three
studies.
In
order
to
18
motivate
this
different
approach,
it
is
useful
to
re­
express
the
hierarchical
model
presented
above
as
a
random
effects
model.
Specifically,
we
can
write
the
model
as:

bij
=
 
+
 j(
i)
+
 i
+
eij,

where
 
is
the
overall
mean,
eij
~
N(
0,
s2
ij),
 i
~
N(
0,
 
2
study)
and
 j(
i)
~
N(
0,
 
2
endpoint),
where
the
notation
j(
i)
indicates
that
the
index
j
is
nested
within
i.

For
Analysis
A,
described
above,
we
need
to
consider
a
model
that
includes
only
the
full­
scale
IQ
outcomes
from
the
three
studies.
To
fit
this
model,
we
need
only
a
relatively
simple
version
of
the
NAS
model,
since
there
is
only
one
outcome
per
study.
The
fitted
model
will
be:

bi
=
 
+
 i
+
ei
where
 
is
the
overall
mean,
ei
~
N(
0,
s2
i)
and
 i
~
N(
0,
 
2
study)
(
that
is,
no
outcome­
to­
outcome
variance
component
is
needed).
While
this
model
can
be
easily
fit
in
WINBUGS,
it
is
simple
enough
that
maximum
likelihood
estimates
(
MLEs)
are
easily
obtained.
The
MLE
of
the
overall
mean
 
can
be
expressed
as
a
weighted
average
of
the
study­
specific
estimates:

 
wi
bi
/
 
wi
where
wi=
1/(
 
2
study
+
s2
i)
and
the
sum
is
over
study.
The
MLE
of
 
2
study
does
not
have
a
closed
form
solution,
but
can
be
expressed
as
the
solution
of
a
relatively
simple
equation,
namely
 
wi
2
(
bi
 
 )
2/
 
wi
=
1.

Analysis
B
will
involve
fitting
models
that
combine
various
Cognition
and
Achievement
endpoints
from
the
three
different
studies.
To
motivate
our
proposed
approach,
suppose
for
now
that
each
of
the
three
studies
measures
the
same
set
of
endpoints.
Then
a
natural
model
would
be
a
so
called
"
crossed
random
effects
model"
which
takes
the
form:
19
bij
=
 
+
 j
+
 i
+
eij,

where
as
above,
 
is
the
overall
mean,
eij
~
N(
0,
s2
ij)
and
 i
reflects
study­
to­
study
variability
and
is
distributed
as
N(
0,
 
2
study).
However,
since
we
are
assuming
for
the
moment
that
there
are
no
shared
outcomes
between
studies,
 j
is
simply
another
random
effect,
distributed
as
N(
0,

 
2
endpoint)
and
reflecting
outcome­
to­
outcome
variability.
Crossed
random
effects
models
have
been
discussed
by
a
number
of
authors
(
see
for
example
McCulloch
1997).
Note
that
the
formulation
is
very
similar
that
used
in
the
NAS
report
(
NRC
2000)
except
that
there
is
no
nesting
of
the
subscript
j
within
i.
Of
course,
we
need
to
address
the
fact
that
some
of
the
endpoints
(
IQ,
Boston
Naming
Test
and
California
Verbal
Learning
Test)
are
common
to
two
or
perhaps
three
of
the
studies
while
others
are
study­
specific.
To
do
this,
we
will
re­
index
the
set
of
estimated
regression
coefficients
as
b1,
b2, .
bL
where
L
represents
the
total
number
of
regression
coefficients
measured
across
all
three
studies.
Similarly,
we
index
the
associated
standard
errors
as
s1,
s2, .
sL
For
example,
our
analysis
B
will
included
4
outcomes
from
the
Faroes
study,
5
from
the
Seychelles
study
and
4
from
New
Zealand,
in
which
case
L
will
equal
13.
Along
with
each
coefficient,
we
will
assign
a
study
indicator
studyi
which
takes
the
value
1,

2
or
3
depending
on
whether
the
outcome
was
measured
for
the
Faroes,
Seychelles,
or
New
Zealand
study.
We
will
also
assign
an
outcome
indicator,
outcomei
which
classifies
the
nature
of
the
outcome.
If
all
the
outcomes
were
distinct,
then
outcomei
would
also
take
on
L
different
values
and
the
setting
would
be
the
same
as
the
NAS
one
described
above.
In
our
case,
there
are
L=
9
distinct
outcomes
related
to
Cognition
and
Achievement.
We
will
then
fit
the
model
bi
=
 
+
 outcomei
+
 studyi
+
ei
where
as
above,
 
is
the
overall
mean,
ei
~
N(
0,
s2
i),
 studyi
~
N(
0,
 
2
study)
and
 outcomei
~
N(
0,

 
2
endpoint).
20
With
the
exception
of
Analysis
A
where
a
simple
MLE
approach
can
be
used,
all
analyses
will
be
conducted
using
a
Bayesian
formulation
via
the
statistical
package
BUGS.
In
addition
to
specifying
the
distribution
of
the
observed
data,
a
Bayesian
approach
requires
the
specification
of
"
prior"
distributions
on
model
parameters
such
as
 ,
 
2
study
and
 
2
endpoint.
When
these
priors
are
very
broad
(
corresponding
to
so
called
"
flat
priors"),
the
resulting
model
fit
will
be
essentially
equivalent
to
those
based
on
a
maximum
likelihood
analysis.
However,
the
Bayesian
approach
has
considerable
computational
advantage
in
the
sense
that
methodological
developments
over
the
past
10
to
20
years
have
made
it
possible
to
fit
complex
hierarchical
models
that
would
be
extremely
difficult
to
fit
using
a
more
conventional
MLE
approach.
As
discussed
by
Gelfand
et
al.
(
1995),
however,
the
convergence
properties
of
Bayesian
model
fitting
can
be
sensitive
to
model
parameterization.
We
found
that
when
we
specified
flat
priors
on
both
the
variance
components
(
 
2
study
and
 
2
endpoint),
we
invariably
encountered
convergence
problems.

Explorations
of
the
likelihood
surface
suggested
that
the
basic
problem
here
is
that
our
relatively
small
dataset
does
not
provide
enough
information
to
reliably
divide
the
overall
variability
into
these
two
different
sources
of
variance.
A
more
stable
model
fit
was
achieve
by
reparametrizing
to
 study
=
R endpoint,
where
R
represents
a
ratio
of
study­
to­
study
relative
to
endpoint­
to­
endpoint
variability.
We
considered
two
approaches
to
fitting
our
models.
In
the
first
approach,
the
ratio
R
was
considered
to
be
a
random
variable
with
a
broad
prior
distribution
ranging,
for
example,

from
0
to
100.
In
general,
however,
we
found
that
models
that
allowed
R
full
freedom
to
vary
were
highly
sensitive
to
the
assumed
prior
and
that
posterior
distributions
were
very
wide,

suggesting
relatively
little
information
in
the
data
to
estimate
R.
Our
second
and
preferred
approach
was
to
fit
our
models
under
a
feasible
range
of
fixed
values
of
R.
This
approach
21
yielded
computationally
stable
results
and
allowed
us
to
explicitly
evaluate
the
sensitivity
of
our
results
to
the
values
of
the
variance
components.
All
fitted
models
were
checked
for
convergence
and
refit
with
multiple
different
starting
values
to
ensure
that
reliable
estimates
had
been
obtained.
Sample
code
is
provided
in
the
Appendix.

Results
Table
3
shows
the
estimated
dose
response
coefficients
and
corresponding
standard
errors
for
the
endpoints
related
to
Cognition/
Achievement.
Coefficients
from
two
different
analyses
are
reported
for
both
the
Faroes
and
New
Zealand
studies.
From
hereon,
we
will
mostly
focus
on
the
coefficients
labeled
"
all
available
data"
for
the
Faroes
study
and
"
Subset
analysis"
for
the
New
Zealand
study.
Tables
4
and
5
show
analogous
coefficients
for
the
Attention/
Behavior
and
Motor
Skills
domains.

Table
6
shows
the
scaling
factors
used
to
transform
the
results
for
each
test,
as
reported
by
the
study
investigators,
to
the
same
scale
as
full
scale
IQ.
Table
7
shows
the
rescaled
results
for
the
Cognition/
Achievement
domain.
It
is
interesting
to
see
that
the
rescaled
results
appear
generally
more
comparable
than
the
unscaled
values.
Figure
5
shows
the
point
estimates
and
confidence
intervals
for
full
scale
IQ
estimated
from
the
three
studies
(
estimated
dose
response
coefficient,
plus/
minus
2*
se).
Note
the
substantial
overlap
between
the
three
confidence
intervals.
We
applied
the
simple
maximum
likelihood
analysis
to
the
rescaled
data
and
found
an
overall
mean
dose
response
coefficient
of
­.
15,
with
corresponding
95%
confidence
interval
(­
22
.259,
­.
047).
The
estimated
study­
to­
study
variance
component
was
0,
however,
reflecting
the
relatively
high
degree
of
overlap
between
the
confidence
intervals
displayed
in
Figure
5.

However,
with
only
3
data
points
available
for
analysis,
it
is
likely
that
there
is
insufficient
information
to
reliably
estimate
this
parameter.
By
adding
in
the
additional
endpoints
related
to
achievement
and
cognition,
we
expect
to
obtain
a
more
accurate
estimate.

Figure
6
shows
the
estimated
dose
response
coefficients
for
all
the
endpoints
associated
with
achievement
and
cognition.
The
three
IQ
estimates
are
indicated
with
the
letter
Q.
While
confidence
limits
are
omitted
from
the
figure
so
as
not
to
make
it
too
cluttered,
the
general
impression
is
of
a
modest
degree
of
both
study­
to­
study
and
endpoint
to
endpoint
variability.

While
IQ
is
the
only
endpoint
common
to
all
three
studies,
the
Faroes
and
Seychelles
studies
also
share
the
Boston
Naming
Test
and
the
California
Verbal
Learning
Test.
Plotting
symbols
Q,
B
and
C
are
used
to
indicate
these
three
endpoints
and
other
endpoints
are
simply
indicated
with
the
symbol
X.
The
three
common
endpoints
provide
valuable
additional
information
regarding
study­
to­
study
variability.

As
discussed
above,
the
full
hierarchical
model
exhibited
poor
convergence
properties
when
two
separate
variance
components
were
estimated.
We
concluded
that
not
enough
information
is
available
in
our
relatively
limited
dataset
to
separately
estimate
two
separate
variance
components
corresponding
to
study­
to­
study
as
well
as
endpoint­
to­
endpoint
variability.
In
an
effort
to
reduce
the
dimensionality
of
the
model,
we
reran
the
model
assuming
a
fixed
ratio
R
between
these
two
sources.
The
results
are
shown
in
Table
8
for
values
of
R
ranging
between
0.25
and
3.
The
Bayesian
goodness
of
fit
criteria
DIC
was
used
to
compare
model
fits.
Since
23
small
values
indicate
a
better
fitting
model,
the
results
suggest
that
values
of
R
greater
than
one
(
indicating
greater
study­
to­
study
variability
compared
with
endpoint­
to­
endpoint)
are
appropriate.
The
optimal
value
of
R
appears
to
be
in
the
range
of
1
to
1.5,
which
have
the
lowest
(
most
negative)
DIC
values.
We
will
use
the
results
for
R=
1.5
for
later
calculations.

Interestingly,
however,
the
estimated
dose
response
coefficient
was
remarkably
robust,
taking
a
value
of
between
­.
12
to
­.
13,
regardless
of
the
value
of
R
chosen.
The
variability
of
the
estimated
coefficients
was
also
fairly
stable
(
standard
errors
ranging
between
.05
and
.07).
The
95%
confidence
intervals
all
excluded
zero,
suggesting
a
significant
effect
of
prenatal
mercury
on
full
scale
IQ.
Note
that
the
estimated
coefficients
are
slightly
closer
to
zero
(­.
12
to
­.
13
range)
under
this
hierarchical
analysis
compared
to
the
MLE
analysis
based
only
on
the
IQ
variables
(­.
15).
This
is
because
the
hierarchical
model
is
influenced
by
the
estimated
coefficients
on
other
tests
besides
IQ.
In
addition,
the
hierarchical
model
weights
the
various
data
points
differently,
depending
on
the
values
of
the
estimated
variance
components.

To
address
concerns
of
robustness
and
reliability
of
our
results,
we
performed
a
number
of
different
sensitivity
analyses.
Of
particular
interest
was
the
impact
of
whether
or
not
the
one
very
highly
exposed
New
Zealand
child
was
included
in
the
analysis.
Figure
7
shows
the
same
IQ
data
(
estimates
and
95%
confidence
intervals)
as
in
Figure
5,
except
it
also
includes
the
confidence
intervals
from
New
Zealand
based
on
an
analysis
that
includes
all
children.
The
results
illustrate
the
dramatic
impact
of
this
one
child
on
the
estimated
dose
response
coefficient
for
IQ
in
the
New
Zealand
study.
Figure
8
is
a
revised
version
of
Figure
6
that
includes
this
highly
exposed
child
and
compares
the
dose
response
coefficients
from
the
cognition/
achievement
related
endpoints.
Figure
8
suggests
a
higher
study
to
study
concordance
24
than
does
Figure
6.
Table
9
shows
the
results
of
refitting
our
model
with
and
without
this
one
extremely
exposed
New
Zealand
child,
as
well
as
under
different
assumptions
about
the
hair
to
blood
ratio
and
also
using
the
alternative
scale
of
the
SEM­
based
IQ
estimate.
All
the
results
in
Table
9
are
based
on
an
assumed
R
value
of
1.5.
Consistent
with
the
patterns
seen
in
Figures
7
and
8,
including
the
one
highly
exposed
child
in
the
New
Zealand
study
led
to
a
smaller
estimate
of
the
study­
to­
study
variance
component.
The
estimated
dose
response
coefficient
associated
with
IQ
was
attenuated,
but
the
precision
associated
with
the
estimate
increased,
with
95%

confidence
interval
of
(­.
204,
­.
025),
assuming
a
hair/
blood
ratio
of
200.
As
expected,
assuming
a
hair/
blood
ratio
of
250
(
instead
of
the
value
of
200
assumed
in
Table
8)
attenuates
the
estimated
effects
of
methylmercury.
For
example
when
applied
to
the
data
excluding
the
highly
exposed
New
Zealand
child,
the
estimated
dose
response
coefficient
changes
from
­.
131
(
95%

confidence
interval
of
­.
281
to
­.
028)
to
­.
115
(
95%
confidence
interval
­.
266
to
­.
018
).
Figure
9
shows
a
revised
version
of
Figure
6,
but
with
the
dose
response
coefficient
associated
with
IQ
for
the
Faroes
study
being
standardized
using
the
estimated
standard
deviation
of
the
latent
variable
(.
586),
rather
than
the
standard
deviation
of
the
Digit
Spans
subscale
(
1.45).
The
resulting
rescaled
dose
response
coefficient
increases
in
magnitude
to
­
0.306
instead
of
­
0.124.

As
a
result,
the
estimated
study­
to­
study
variability
changes
quite
substantially
to
around
.1
or
.12,
depending
on
the
assumed
hair/
blood
ratio.
The
resulting
estimated
dose
response
coefficient
of
IQ
increases
in
magnitude
to
around
­
0.2
(
with
precise
value
depending
on
the
hair/
blood
ratio)
and
a
95%
confidence
interval
of
(­.
451,
­.
030)
with
hair/
blood
ratio
equaling
250
and
(­.
512,
­.
038)
with
hair/
blood
ratio
equaling
200.
25
We
also
performed
sensitivity
analyses
to
evaluate
the
robustness
of
the
SEM­
based
estimate
of
the
effect
of
mercury
on
IQ
for
the
Faroe
Islands
study.
Of
particular
concern
was
the
possible
imprecision
and
underestimation
of
the
IQ
effect,
due
to
possible
problems
with
the
Similarities
endpoint.
There
were
several
issues
with
this
endpoint,
including
the
fact
that
very
different
results
were
obtained
depending
on
which
of
two
different
examiners
administered
the
test
(
see
Table
2
in
Budtz­
Jorgensen
et
al.
2005).
We
asked
the
Faroes
investigators
to
repeat
their
analysis
using
only
the
Examiner
A
results,
using
only
the
Examiner
B
results
and
then
excluding
Similarities
altogether.
Table
10
shows
the
results.
The
columns
labeled
"
b"
and
"(
se)"
show
the
estimated
IQ
dose
response
coefficient
and
its
standard
error,
based
on
the
SEM
model,
expressed
in
the
same
scale
as
Digit
Spans.
The
columns
headed
"
rescaled
values1"
are
the
estimated
dose
response
coefficient
and
standard
error,
rescaled
to
full
scale
IQ.
This
rescaling
is
achieved
just
as
in
Table
7,
namely
by
multiplying
the
unscaled
estimates
by
15
divided
by
the
standard
deviation
of
Digit
Spans
for
the
Faroes
study.
An
extra
factor
of
2
(
see
Table
6)
is
included
to
account
for
the
conversion
from
cord
blood
to
hair.
The
columns
headed
"
rescaled
values2"
are
similar,
except
that
the
rescaling
is
done
using
the
standard
deviation
of
the
latent
variable,
instead
of
the
standard
deviation
of
Digit
Spans.
The
first
row
of
the
Table
repeats
information
presented
earlier
in
the
report
(
see
Table
7),
namely
the
estimated
effect
of
methylmercury
based
on
an
SEM
analysis
of
all
three
subscales,
using
results
for
Similarities
from
both
examiners.
Looking
at
unscaled
values,
as
well
as
the
values
rescaled
using
the
standard
deviation,
we
see
that
as
expected,
the
estimated
dose
response
effect
is
much
stronger
when
the
results
are
based
on
Examiner
A
(
the
neuropsychologist)
than
on
Examiner
B
(
the
Assistant).
A
similar
pattern
can
be
seen
from
the
last
two
columns
of
the
Table,
corresponding
to
the
case
where
the
results
are
rescaled
using
the
estimated
standard
deviation
of
the
latent
26
variable.
The
last
row
of
the
table
shows
the
results
obtained
when
Similarities
are
omitted
from
the
SEM
analysis.
The
estimated
dose
effect
is
weaker
(­.
098
versus
­.
124)
when
standardized
using
the
standard
deviation
of
Digit
Spans,
but
very
similar
(­.
282
versus
­.
307)

when
standardized
using
the
standard
deviation
of
the
latent
variable.

As
a
final
secondary
analysis,
we
refit
our
hierarchical
models
including
all
available
endpoints
from
all
three
domains,
that
is,
including
the
endpoints
listed
in
Tables
11
and
12,
related
to
attention/
behavior
and
motor
skills
as
well
as
the
cognition/
achievement
endpoints.
Ideally
we
would
include
an
additional
domain­
specific
effect.
However,
as
discussed
earlier,
there
is
not
enough
data
to
reliably
fit
models
with
multiple
variance
components.
Because
of
the
heterogeneity
of
the
endpoints
being
included,
we
would
not
expect
this
analysis
to
be
a
reliable
predictor
of
the
effect
of
IQ.
However,
it
is
helpful
to
compare
the
estimates
with
those
already
considered
as
part
of
analysis
B.
Figure
10
shows
the
estimated
dose
response
parameters
from
all
endpoints
for
the
three
studies.
The
Figure
shows
a
similar
pattern
to
that
seen
in
Figure
6,

namely
a
marked
degree
of
both
study­
to­
study
and
endpoint­
to­
endpoint
variability.
Table
13
shows
the
results
of
fitting
a
hierarchical
model
to
all
endpoints,
for
fixed
values
of
R
ranging
from
.5
to
3.
As
in
Table
8,
the
results
shown
here
assume
a
hair
to
blood
ratio
of
200,
exclude
the
highly
exposed
New
Zealand
child
and
use
the
standard
deviation
of
Digit
Spans
to
rescale
the
estimated
IQ
dose
response
based
on
the
SEM
analysis
of
Budtz­
Jorgensen
et
al.
(
2005).
In
contrast
to
Table
8,
the
results
here
are
fairly
sensitive
(
in
terms
of
estimated
variance
components
and
the
dose
response
parameter)
to
the
assumed
value
of
R.
In
general,
the
estimated
study­
to­
study
variance
component
is
larger
than
that
estimated
using
only
the
cognition/
achievement­
related
endpoints.
As
a
consequence,
the
estimated
dose
response
27
coefficient
for
IQ
is
less
reliably
estimated
and
most
of
the
95%
confidence
intervals
just
include
zero,
suggesting
only
a
marginally
significant
exposure
effect.
Table
14
shows
the
sensitivity
of
the
results
based
on
using
all
endpoints
to
whether
or
not
the
highly
exposed
New
Zealand
child
was
included
or
excluded,
the
assumed
value
for
the
hair/
blood
ratio
and
also
under
the
alternative
scaling
for
the
SEM­
based
estimate
of
the
effect
of
mercury
on
full
scale
IQ
in
the
Faroe
Islands.
As
seen
in
Table
9
for
the
analysis
of
cognitive
endpoints,
the
results
are
somewhat
sensitive
to
the
New
Zealand
outlier.

Conclusions
We
have
used
hierarchical
models
to
combine
data
from
three
studies
designed
to
assess
the
effects
of
prenatal
methylmercury
exposure
on
child
development.
Although
the
main
objective
of
our
analysis
concerns
full
scale
IQ,
we
have
described
several
broader
analyses
that
bring
in
additional
endpoints
as
well
in
an
effort
to
obtain
more
accurate
estimates
of
study­
to­
study
and
endpoint­
to­
endpoint
variability.

While
a
standard
maximum
likelihood
analysis
was
possible
when
we
were
combining
only
the
IQ
results
based
on
the
three
studies,
we
used
a
Bayesian
approach
for
the
more
complex
analyses
that
used
additional
endpoints.
The
Bayesian
approach
has
several
advantages
in
our
context.
First
of
all,
the
models
are
very
easy
to
implement
using
the
statistical
package
WinBUGS.
Also,
the
Bayesian
approach
is
more
appropriate
in
small
sample
settings
such
as
ours,
since
it
avoids
the
need
for
using
the
standard
asymptotic
basis
for
statistical
inference.

With
a
few
fairly
minor
exceptions,
results
from
the
Bayesian
analysis
can
be
interpreted
in
the
same
way
that
one
would
interpret
results
from
a
maximum
likelihood
analysis.
Although
a
28
Bayesian
analysis
does
not
yield
confidence
intervals,
but
rather
highest
posterior
density
intervals,
we
have
referred
to
confidence
intervals
throughout
our
report
and
tables
to
aid
in
the
interpretation
of
our
results.
Unlike
the
standard
MLE­
based
confidence
intervals
which
generally
correspond
to
the
estimate
plus
or
minus
1.96
times
the
standard
error
(
assuming
a
95%
level
of
confidence),
Bayesian
confidence
intervals
may
be
non­
symmetric.
This
asymmetry
reflects
the
fact
that
the
Bayesian
approach
applies
in
the
small
sample
setting
and
does
not
base
its
inference
on
asymptotic
normality
assumptions.

Although
data
are
available
from
only
three
studies,
we
have
found
that
the
IQ
and
related
endpoints
have
a
fairly
consistent
pattern
across
the
three
studies.
While
the
estimates
are
somewhat
sensitive
to
assumed
values
of
the
variance
components,
how
the
dose
response
coefficient
from
the
SEM
analysis
is
scaled,
as
well
as
to
whether
or
not
one
very
highly
exposed
child
from
New
Zealand
is
included
in
the
analysis,
the
overall
conclusion
is
that
prenatal
exposure
to
mercury
results
in
a
significant
decrease
in
full
scale
IQ
with
a
central
estimate
in
the
approximate
range
of
­
0.1
to
­
0.25
IQ
points
for
every
1
ug/
g
increase
in
maternal
hair
mercury
exposure.
The
estimated
IQ
dose­
response
coefficient
based
on
the
MLE
analysis
using
only
the
IQ
data
from
the
three
studies
had
a
larger
magnitude
(­.
15,
with
corresponding
95%
confidence
interval
of
­.
259
to
­.
047)
than
the
Bayesian
hierarchical
approach.
This
occurred
because
the
MLE
approach
yielded
an
estimated
study­
to­
study
variability
of
0.
In
reality,
some
study­

tostudy
variability
is
inevitable.
Hence
we
place
more
faith
in
the
Bayesian
analysis,
concluding
that
the
MLE
results
are
somewhat
unreliable
due
to
the
small
sample
size
involved.
29
There
are
several
potential
limitations
to
this
analysis.
Our
analysis
is
based
on
published
data,

not
original
data.
While
this
is
often
cited
as
a
problem
for
cross
study
analyses,
its
impact
is
lessened
in
our
context
for
several
reasons.
First,
all
three
studies
had
careful
epidemiological
designs
that
measured
a
variety
of
important
potential
confounders
such
as
maternal
age
and
education.
All
estimated
dose
response
coefficients
were
derived
from
well
documented
regression
models
that
adjusted
for
age,
maternal
education
and
other
important
factors.
Also,

the
National
Academy
of
Sciences
had
asked
the
individual
study
investigators
for
very
specific
details
about
the
way
in
which
their
analyses
had
been
done
and
had
also
asked
for
additional
analyses
where
necessary.
We
were
also
able
to
obtain
some
additional
analyses.
There
is
precedent
in
the
literature
(
see
Dominici
et
al,
2000)
for
hierarchical
modeling
of
estimated
dose
response
coefficients
extracted
from
separate
studies.

Our
analyses
were
based
on
results
reported
by
the
individual
study
investigators,
using
standard
linear
regression
with
methylmercury
exposure
entered
as
a
linear
term.
On
this
point,
we
followed
the
conclusions
of
the
NAS
Committee
on
the
Toxicological
Effects
of
Methylmercury,

which
concluded
that
linear
models
are
most
appropriate
for
dose­
response
modeling
of
mercury's
neurodevelopmental
effects.
It
would
be
useful
to
compare
our
results
with
those
based
on
regression
models
using
an
exposure
transformation,
such
as
log
or
even
under
the
more
general
K­
power
model
(
see
earlier
discussion).
It
would
also
be
of
interest
to
explore
semi­
parametric
approaches,
for
example,
allowing
the
mercury
dose
response
to
be
captured
by
a
flexible
spline
model.
However,
performing
such
analyses
is
not
possible
without
access
to
the
original
data
or
without
obtaining
details
from
the
individual
study
investigators
regarding
models
fitted
under
different
assumptions.
30
Another
possible
limitation
of
our
analysis
is
its
reliance
on
normal
modeling
assumptions.
This
is
not
a
problem
at
all
for
the
estimated
regression
coefficients
themselves,
since
the
central
limit
theorem
ensures
their
approximate
normality.
However,
the
required
normality
assumption
on
the
study­
specific
means
is
more
problematic
and
not
testable
given
the
small
number
of
different
studies
available
for
our
analysis
Choice
of
endpoints
for
inclusion
in
our
analysis
was
another
potential
limitation.
Because
we
were
working
from
published
data,
we
were
limited
to
the
endpoints
and
associated
statistics
that
had
been
reported
by
the
three
sets
of
investigators.
Because
of
this,
there
is
always
concern
that
results
might
have
been
different
if
a
more
exhaustive
set
of
endpoints
had
been
included.

However,
in
light
of
the
fact
that
the
various
investigators
come
to
different
conclusions
regarding
the
effects
of
prenatal
methylmercury
and
that
the
reported
endpoints
were
relatively
comparable
across
all
three
studies,
it
is
unlikely
that
our
results
were
influenced
by
any
systematic
exclusions.
We
did
make
some
additional
exclusions,
for
reasons
described
in
the
report,
and
mostly
having
to
do
with
the
absence
of
a
reliable
population
standard
that
allowed
us
to
comfortably
rescale
the
results
to
the
IQ
scale.
Table
15
reports
the
standardized
regression
coefficients
for
the
various
excluded
endpoints.
Figure
11
adds
these
additional
endpoints
to
those
already
shown
in
Figure
10.
Although
there
was
also
one
excluded
endpoint
(
McCarthy
Motor)
for
the
New
Zealand
study,
we
could
not
compute
a
rescaled
regression
coefficient
since
the
reason
for
exclusion
was
that
a
log
transformation
had
been
taken
and
we
did
not
have
access
to
the
standard
deviation
on
the
log
scale.
While
some
endpoints
were
also
excluded
from
the
Faroes
study,
this
only
happened
for
settings
where
there
were
multiple
versions
of
the
same
31
endpoint
(
e.
g.
Boston
Naming
Test,
with
and
without
cues).
Thus,
Table
15
and
Figure
11
only
add
in
additional
endpoints
for
the
Seychelles
study.
The
Figure
and
Table
both
show
that
these
additional
endpoints
have
a
fairly
random
scatter
about
zero.
When
we
reran
the
Bayesian
hierarchical
model
including
these
endpoints,
we
found
that
they
had
a
fairly
strong
influence
on
the
estimated
study­
to­
study
and
endpoint­
to­
endpoint
variability,
reducing
these
and
also
pulling
in
all
the
estimated
dose
response
coefficients
closer
to
zero.
As
might
be
expected,
it
appears
that
the
overall
signal
is
dampened
if
we
add
in
additional
endpoints
that
are
unrelated
to
the
cognition/
achievement
domain
and
which
appear
to
have
little
or
no
relation
with
mercury
exposure.

As
indicated
above,
we
also
excluded
endpoints
from
our
analysis
in
cases
where
they
represented
variations
of
other
endpoints
already
included.
For
example,
Finger
tapping
results
are
available
for
the
dominant
hand,
for
the
non­
preferred
hand
or
for
a
combination
of
both
hands.
Similarly,
the
Boston
Naming
Test
results
are
reported
with
and
without
cues.

Technically,
it
would
be
possibly
to
extend
our
Bayesian
hierarchical
model
to
include
all
such
related
endpoints,
but
including
another
level
in
the
model
that
reflected
version­
to­
version
variation
within
test
(
in
addition
to
study­
to­
study
as
well
as
endpoint­
to­
endpoint
variation).

This
would
correspond
to
a
type
of
measurement
error
model.
However,
in
light
of
the
challenges
experienced
already
with
including
two
levels
in
our
model,
due
to
our
relatively
small
sample
sizes,
such
an
approach
was
not
practically
feasible
in
our
setting.
We
believe
that
our
chosen
approach,
namely
choosing
dominant
hand
results
where
available
as
well
as
focusing
on
versions
of
tests
that
were
common
to
two
or
more
studies,
is
well
justified.
32
Our
analysis
was
based
on
a
number
of
different
endpoints,
many
of
which
had
different
scales
(
that
is,
standard
deviations).
For
example,
while
full
scale
IQ
has
a
standard
deviation
of
15,
the
Boston
Naming
Test
has
a
standard
deviation
of
5.
Our
solution
to
this
was
to
rescale
all
the
estimated
regression
coefficients
so
that
they
could
be
interpreted
as
being
applicable
to
an
endpoint
with
a
standard
deviation
of
15
(
that
is
the
same
scale
as
IQ).
Our
use
of
rescaled
estimates
is
analogous
to
the
use
of
"
effect
sizes".
This
refers
to
an
approach
that
involves
reporting
results
in
terms
of
standard
deviation
units,
and
is
a
well
established
way
to
think
about
comparability
of
dose
response
coefficients
across
settings
where
endpoints
have
different
scales.

Given
our
choice
of
full
scale
IQ
as
our
primary
outcome
measure,
it
is
unfortunate
that
this
was
not
measured
directly
in
the
Faroes
study.
We
have
relied
on
an
estimated
the
dose­
response
coefficient
derived
from
an
SEM
analysis
of
the
three
IQ
subscales
that
were
reported
for
the
Faroes
study.
The
SEM
analysis
also
relies
on
the
underlying
normality
of
the
data.
Some
confidence
in
the
approach
can
be
derived
from
the
observed
similarity
between
the
correlations
between
the
subscales
and
the
estimated
latent
IQ
variable
and
those
observed
in
populationbased
studies
where
full
scale
IQ
has
been
measured
directly.

Another
source
of
uncertainty
in
our
analysis
is
the
hair
to
blood
ratio,
needed
for
converting
the
estimated
regression
coefficients
from
the
Faroes
study.
Our
primary
analyses
use
a
ratio
of
200,

which
is
the
ratio
of
mercury
measured
in
maternal
hair
versus
cord
blood,
measured
by
the
Faroes
investigators
(
Budtz­
Jorgensen
et
al.
2004).
However,
because
values
ranging
up
to
250
have
either
been
observed
in
other
studies
or
thought
to
be
feasible,
we
also
ran
sensitivity
analyses
where
we
used
the
value
of
250.
While
we
believe
our
results
based
on
the
observed
33
Faroes
value
of
200
are
the
most
well
justified,
it
would
also
be
of
interest
to
explore
additional
analyses
that
treated
the
hair/
blood
ratio
as
a
random
variable
in
order
to
assess
the
impact
of
possible
misspecification
of
this
ratio
on
both
the
estimated
dose
response
coefficient
of
IQ,
as
well
as
is
associated
variability.
However,
given
the
relatively
small
size
of
our
available
datasets,
we
found
that
models
incorporating
this
extra
layer
of
variability
did
not
run
well.

While
our
use
of
a
Bayesian
approach
could
be
viewed
by
some
as
a
limitation,
we
view
it
more
as
a
computational
tool
for
fitting
hierarchical
models
and
did
not
use
it
to
force
any
strong
prior
assumptions.
We
did
find
that
convergence
issues
were
a
problem
for
our
model,
but
addressed
this
through
the
use
of
sensitivity
analyses
that
fixed
various
model
parameters
at
different
values
in
order
to
evaluate
their
impact
on
the
estimates
of
interest.
We
also
found
that
the
model
convergence
was
fairly
sensitive
to
prior
specification.
In
theory,
it
is
appropriate
to
place
so
called
"
flat"
prior
on
the
variance
components
and
regression
coefficients
in
the
model.

Doing
so
allows
the
model
to
handle
such
parameters
in
a
Bayesian
framework
without
imposing
any
strong
prior
judgment
that
would
overly
influence
the
results.
However,
we
found
that
our
models
exhibited
poor
mixing
and
convergence
properties
when
very
flat
priors
were
used,

especially
on
the
variance
components.
Better
performance
was
achieved
when
the
prior
distributions
were
partially
constrained
so
as
to
ensure
that
estimated
parameters
were
in
the
right
ballpark.
Such
an
approach
is
common
practice
among
Bayesian
statisticians
and
by
no
means
implies
that
informative
priors
were
used.

An
advantage
of
the
Bayesian
hierarchical
modeling
approach
is
that
it
allows
naturally
for
building
in
non­
measured
sources
of
uncertainty
and
variability.
Clearly,
for
example,
there
is
34
substantial
study
to
study
variability,
consistent
with
the
fact
that
despite
the
efforts
of
numerous
individuals,
committees
and
panels,
no
one
has
been
able
to
identify
factors
that
explain
the
observed
discrepancies
between
the
results
of
the
three
studies.
By
viewing
the
study­
specific
dose
response
parameters
as
being
drawn
from
some
larger
super­
population
of
parameters,
our
hierarchical
model
allows
naturally
for
this
study­
to­
study
variability.
An
analysis
that
incorporates
data
from
multiple
studies
has
the
advantage
over
those
based
on
a
single
study
in
that
the
inevitable
study­
specific
errors,
omission
and
design
flaws
that
might
bias
the
results
in
one
direction
or
the
other
tend
to
cancel
out.
This
is
the
argument
in
favor
of
our
chosen
approach,
namely
to
take
the
estimated
population
mean
dose
response
as
our
final
estimate
of
the
effects
of
methylmercury
on
IQ.
An
important
caveat
is
that
our
analysis
is
based
on
only
three
studies.
Ideally,
we
would
be
able
to
draw
from
many
more.
The
availability
of
data
from
multiple
endpoints
alleviates
the
concern
to
some
extent.

Credence
for
the
reliability
of
our
modeling
approach
is
provided
by
the
stability
of
the
results
themselves.
Figures
7
to
10
and
Tables
8,
9,
13
and
14
cover
analyses
over
a
broad
range
of
varying
assumptions.
In
general,
we
found
very
consistent
results
based
on
the
endpoints
related
to
cognition
and
achievement.
While
the
individual
dose
response
estimates
varied
somewhat
according
to
the
specific
assumptions
being
made,
results
consistently
suggested
a
significant
association
between
mercury
and
IQ,
with
a
lower
confidence
limit
ranging
from
around
­.
2
to
­

.5
and
an
upper
limit
between
­.
02
and
­.
04.
Estimates
based
on
an
analysis
that
included
all
available
endpoints,
especially
those
that
were
excluded
due
to
data
or
other
problems
(
see
Table
15),
were
more
variable.
However,
this
is
to
be
expected
since
the
model
does
not
appropriately
account
for
domain­
to­
domain
variation.
35
Finally,
there
are
several
caveats
that
should
be
pointed
out
regarding
our
analysis.
Our
focus
has
been
on
estimating
the
dose
response
coefficient
relating
methylmercury
and
full
scale
IQ.

This
choice
was
in
part
driven
by
the
existing
capability
to
assign
economic
value
to
IQ
decrements
for
benefit­
cost
analysis.
Absent
this
factor,
it
might
be
considered
more
appropriate
to
assess
the
effects
of
methylmercury
on
the
broader
Cognition/
Achievement
domain.
In
fact,

our
hierarchical
model
allows
us
to
do
this
by
simply
focusing
on
the
mean
regression
coefficient
for
the
domain,
rather
than
focusing
on
the
IQ
endpoint
specifically.
For
example,
consider
the
last
row
of
Table
14,
corresponding
to
the
estimated
dose
response
coefficient
relating
mercury
to
full
scale
IQ
when
we
use
a
hair/
blood
ratio
of
200
and
the
standard
deviation
of
the
latent
variable
to
scale
the
IQ
coefficient
from
the
Faroes
study.
While
the
IQ­
specific
coefficient
is
­.
242
with
a
95%
confidence
interval
of
(­.
526,
­.
046),
the
overall
mean
coefficient
for
the
achievement/
cognition
domain
is
­
0.184,
with
a
95%
confidence
interval
of
(­
0.428,
­
0.0197).

It
is
important
to
point
out
an
additional
caveat.
Confidence
intervals
presented
for
all
IQ
coefficients
calculated
in
this
report
pertain
to
the
statistical
uncertainty
associated
with
our
estimation
procedure.
The
reported
confidence
intervals
do
not
reflect
other
sources
of
uncertainty
(
e.
g.
the
uncertainty
associated
with
choice
of
dose
response
model),
nor
do
they
reflect
population
variability
and
heterogeneity
in
response
to
methylmercury
exposure.

Consideration
of
such
issues
would
require
a
much
larger
database
and
is
beyond
the
scope
of
this
report.
Another
caveat
to
keep
in
mind
is
that
we
have
not
addressed
issues
of
measurement
error,
but
have
assumed
the
exposures
assigned
to
each
study
subject
are
accurate
representations
of
true
exposure.
In
reality,
there
is
likely
to
be
some
discrepancy
between
measured
and
actual
exposures,
for
example,
due
to
variation
in
hair
length.
Alternatively,
it
is
36
possible
that
the
true
exposure
of
interest
may
have
been
during
the
first
trimester
of
pregnancy,

whereas
exposures
in
maternal
hair
and
cord
blood
measured
at
birth
reflect
exposures
later
in
pregnancy.
Presence
of
exposure
measurement
error
could
introduce
a
bias
in
the
results,
most
likely
towards
the
null
(
see
Budtz­
Jorgensen
et
al.
2004).

NOTE:
This
report
has
been
peer­
reviewed
in
accordance
with
EPA
guidelines.
Peer
review
comments
and
responses
are
presented
elsewhere
(
U.
S.
EPA
2005).

References
Bellinger
D
(
2005).
Neurobehavioral
Assessments
Conducted
in
the
New
Zealand,
Faroe
Islands,
and
Seychelles
Islands
Studies
of
Methylmercury
Neurotoxicity
in
Children.
Report
to
the
U.
S.
Environmental
Protection
Agency.

Budtz­
Jorgensen
E,
Grandjean
P,
Keiding
N,
White
RF,
Weihe
P
(
2000).
Benchmark
dose
calculations
of
methylmercury­
associated
neurobehavioural
deficits.
Toxicology
Letters,
112­

113:
193­
9.

Budtz­
Jorgensen
E,
Keiding
N,
Grandjean
P
(
2001).
Benchmark
dose
calculation
from
epidemiological
data.
Biometrics,
57:
698­
706.

Budtz­
Jorgensen
E,
Keiding
N,
Grandjean
P,
Weihe
P
(
2002).
Estimation
of
health
effects
of
prenatal
methylmercury
exposure
using
structural
equation
models.
Environmental
Health.

14;
1(
1):
2.
37
Budtz­
Jorgensen
E,
Grandjean
P,
Jorgensen
P,
Weihe
P,
Keiding
N.
(
2004).
Association
between
mercury
concentrations
in
blood
and
hair
in
methylmercury­
exposed
subjects
at
different
ages.

Environmental
Research,
95(
3):
385­
93.

Budtz­
Jorgensen
E,
Debes
F,
Weihe
P,
Grandjean
P
(
2005).
Adverse
mercury
effects
in
7
yearold
children
expressed
as
loss
in
"
IQ."
Report
to
the
U.
S.
Environmental
Protection
Agency.

Coull,
BA,
Hobert
JP,
Ryan
LM,
Holmes
LB.
(
2001).
Crossed
random
effect
models
for
multiple
outcomes
in
a
study
of
teratogenesis.
Journal
of
the
American
Statistical
Association,

96:
1194­
1204.

Coull
BA,
Mezzetti
M,
Ryan
LM.
(
2003).
A
Bayesian
hierarchical
model
for
risk
assessment
of
methylmercury.
J
Agricultural,
Biological
&
Environmental
Statistics,
8:
253­
270.

Crump
KS.
(
1984).
A
new
method
for
determining
allowable
daily
intakes.
Fund
&
Appl
Toxicol,
4:
854­
871.

Crump
KS,
Kjellstrom
T,
Shipp
AM,
Silvers
A,
Stewart
A.
(
1998).
Influence
of
prenatal
mercury
exposure
upon
scholastic
and
psychological
test
performance:
Benchmark
analysis
of
a
New
Zealand
cohort.
Risk
Analysis,
18:
701­
713.

Dominici
F,
Samet
JM,
Zeger
SL
(
2000).
Combining
evidence
on
air
pollution
and
daily
mortality
from
the
20
largest
US
cities:
a
hierarchical
modelling
strategy.
Journal
of
yhe
Royal
Statistical
Society
Series
A­
Statistics
in
Society
163:
263­
284.
38
Gaylor
DW
and
Slikker
W.
(
1994).
Modeling
for
risk
assessment
of
neurotoxic
effects.
Risk
Analysis,
14:
333­
338.

Gelfand
AE,
Sahu
SK,
Carlin
BP
(
1995).
Efficient
parameterisations
for
normal
linear
mixed
models.
Biometrika,
82:
479­
488.

Gilks
WR
and
Roberts
GO.
(
1996).
Strategies
for
improving
MCMC.
In
Markov
Chain
Monte
Carlo
in
Practica
(
eds.
Gilks,
Richardson
and
Spiegelhalter).
Chapman
and
Hall.
89­
114.

Grandjean
P,
Weihe
P,
White
RF,
Debes
F,
Araki
S,
Yokoyama
K,
Murata
K,
Sorensen
N,
Dahl
R,
Jorgensen
PJ.
(
1997).
Cognitive
deficit
in
7­
year­
old
children
with
prenatal
exposure
to
methylmercury.
Neurotoxic
&
Teratol,
19:
417­
428.

Kjellstrom
T,
Kennedy
P,
Wallis
S,
Stewart
A,
Friberg
L,
Lind
B,
Wutherspoon
T,
Mantell
C.

(
1989).
Physical
and
mental
development
of
children
with
prenatal
exposure
to
mercury
from
fish.
Stage
2:
interviews
and
psychological
tests
at
age
6.
Solna,
Sweden:
National
Swedish
Environmental
Board
Report
3642.

McCulloch
CE
(
1997).
Maximum
likelihood
algorithms
for
generalized
linear
mixed
models.

Journal
of
the
American
Statistical
Association
92:
162­
170.

McDowell
MA,
Dillon
CF,
Osterloh
J,
Bolger
PM,
Pellizzari
E,
Fernando
R,
de
Oca
RM,

Schober
SE,
Sinks
T,
Jones
RL,
Mahaffey
KR
(
2004).
Hair
mercury
levels
in
US
children
and
39
women
of
childbearing
age:
Reference
range
data
from
NHANES
1999­
2000.
Environmental
Health
Perspectives
112:
1165­
1171.

Myers
GJ,
Davidson
PW,
Cox
C,
Shamlaye
CF,
Tanner
MA,
Choisy
O,
SloaneReeves
J,
Marsh
DO,
Cernichiari
E,
Choi
A,
Berlin
M,
Clarkson
TW.
(
1995).
Neurodevelopmental
outcomes
of
Seychellois
children
sixty­
six
months
after
in
utero
exposure
to
methylmercury
from
a
maternal
fish
diet:
Pilot
study.
Neurotoxicology,
16:
639­
651.

Myers
GJ,
Davidson
PW,
Cox
C,
Shamlaye
CF,
Palumbo
D,
Cernichiari
E,
Sloane­
Reeves
J,

Wilding
GE,
Kost
J,
Huang
LS,
Clarkson
TW.
(
2003).
Prenatal
methylmercury
exposure
from
ocean
fish
consumption
in
the
Seychelles
child
development
study.
Lancet,
361:
1686­
1692.

NRC.
(
2000).
Toxicological
Effects
of
Methylmercury.
National
Academy
Press.

Spiegelhalter
D,
Thomas
A,
Best
N
(
2000).
WinBUGS
Version
1.3
User's
Manual.
Cambridge,

UK:
MRC
Biostatistics
Unit,
Institute
of
Public
Health.
http://
www.
mrc­
bsu.
cam.
ac.
uk/
bugs
U.
S.
EPA
(
2005).
Documentation
of
Peer
Review
for
EPA's
Estimation
of
an
Integrated
Dose­

Response
Function
for
Intelligence
Quotient
(
IQ)
and
Prenatal
Exposure
to
Mercury.
National
Center
for
Environmental
Economics,
U.
S.
Environmental
Protection
Agency.

Wechsler
D.
(
1991).
WISC­
III
Manual.
San
Antonio:
The
Psychological
Corporation.
40
Figure
1:
Structural
Equation
Model
for
the
effects
of
mercury
exposure
on
child
development
(
taken
from
Budtz­
Jorgensen
et
al.
2002).
41
Figure
2:
Simplified
structural
equation
model
used
to
generate
estimates
of
the
dose
response
coefficient
of
cord
blood
mercury
on
full
scale
IQ
in
the
Faroes
study
42
Figure
3:
Illustration
of
dose
response
shapes
from
the
K­
power
model
with
three
different
choices
of
K
43
Figure
4:
Adjusted
Boston
Naming
Test
scores
from
an
analysis
of
the
Faroe
Islands
data
by
Budtz­
Jorgensen
et
al.,
extracted
from
the
NRC
report
on
methylmercury
(
NRC
2000).
44
Figure
5:
95%
confidence
intervals
for
full
scale
IQ
from
the
New
Zealand,
Seychelles
and
Faroes
studies.
45
Figure
6:
Dose­
response
coefficients
for
Achievement
and
Cognition­
related
endpoints
from
the
three
studies.
The
symbols
Q,
C
and
B
denote
the
three
endpoints
that
are
common
to
two
or
more
studies,
namely
IQ,
CVLT
and
BNT,
respectively.
Coefficients
reflect
test
score
decrement
per
ppm
of
maternal
hair
mercury.
The
Faroes
median
hair:
cord
blood
ratio
of
200
(
Budtz­
Jorgensen
et
al.
2004)
was
used
to
convert
Faroes
results
to
units
of
hair
mercury.
46
Figure
7:
95%
confidence
intervals
for
the
effects
of
hair
mercury
on
full
scale
IQ
for
the
three
studies,
including
two
version
of
New
Zealand:
one
including
the
highly
exposed
child
and
the
other
excluding
this
child.
47
Figure
8:
Dose­
response
coefficients
for
Achievement
and
Cognition­
related
endpoints
from
the
three
studies.
New
Zealand
estimates
are
based
on
including
the
one
highly
exposed
child.

The
symbols
Q,
C
and
B
denote
the
three
endpoints
that
are
common
to
two
or
more
studies,

namely
IQ,
CVLT
and
BNT,
respectively.
Coefficients
reflect
test
score
decrement
per
ppm
of
maternal
hair
mercury.
The
Faroes
median
hair:
cord
blood
ratio
of
200
(
Budtz­
Jorgensen
et
al.

2004)
was
used
to
convert
Faroes
results
to
units
of
hair
mercury.
48
Figure
9:
Dose­
response
coefficients
for
Achievement
and
Cognition­
related
endpoints
from
the
three
studies.
Figure
uses
alternative
scaling
for
the
SEM­
based
dose
response
coefficient
for
IQ
from
the
Faroes
study.
The
symbols
Q,
C
and
B
denote
the
three
endpoints
that
are
common
to
two
or
more
studies,
namely
IQ,
CVLT
and
BNT,
respectively.
The
highly
exposed
New
Zealand
child
is
excluded.
Coefficients
reflect
test
score
decrement
per
ppm
of
maternal
hair
mercury.
The
Faroes
median
hair:
cord
blood
ratio
of
200
(
Budtz­
Jorgensen
et
al.
2004)
was
used
to
convert
Faroes
results
to
units
of
hair
mercury.
49
Figure
10:
Endpoints
from
all
three
neurological
domains
in
the
three
studies.
As
in
Figure
6,

the
symbols
Q,
C
and
B
denote
the
three
endpoints
that
are
common
to
two
or
more
studies,

namely
IQ,
CVLT
and
BNT,
respectively.
The
symbols
R
and
F
denote
the
attention
and
motor
endpoints
that
are
common
to
two
or
more
studies,
namely
CPT
Reaction
Time
and
Finger
Tapping.
The
highly
exposed
New
Zealand
child
is
excluded.
Coefficients
reflect
test
score
decrement
per
ppm
of
maternal
hair
mercury.
The
Faroes
median
hair:
cord
blood
ratio
of
200
(
Budtz­
Jorgensen
et
al.
2004)
was
used
to
convert
Faroes
results
to
units
of
hair
mercury.
50
Figure
11:
Endpoints
from
all
three
neurological
domains
that
met
the
inclusion
criteria
for
our
analysis,
as
well
as
7
additional
excluded
endpoints
from
the
Seychelles
study.
As
in
Figure
6,
the
symbols
Q,
C
and
B
denote
the
three
endpoints
that
are
common
to
two
or
more
studies,
namely
IQ,
CVLT
and
BNT,
respectively.
The
symbols
R
and
F
denote
the
attention
and
motor
endpoints
that
are
common
to
two
or
more
studies,
namely
CPT
Reaction
Time
and
Finger
Tapping.
The
Symbol
N
denotes
newly
added
endpoints.
The
highly
exposed
New
Zealand
child
is
excluded.
Coefficients
reflect
test
score
decrement
per
ppm
of
maternal
hair
mercury.
The
Faroes
median
hair:
cord
blood
ratio
of
200
(
Budtz­
Jorgensen
et
al.
2004)
was
used
to
convert
Faroes
results
to
units
of
hair
mercury.
51
Appendix:
WinBUGS
code
to
generate
results
in
Table
8
Data
File
list(
nrow=
13,
nstudy=
3,
nendpoint=
9,
b=
c(­.
53,
­.
54,
­.
53,
­.
60,
­.
13,
.013,
­.
012,
­.
021,
­.
010,
­.
024,
.073,
­.
190,
­.
058),
bse
=
c(.
30,
.32,
.32,
.323,
.10,
.010,
.046,
.029,
.12,
.011,
.059,
.063,
.032),
study
=
c(
1,1,1,1,
2,2,2,2,2,
3,3,3,3),
endpoint=
c(
1,2,9,3,
1,4,6,7,8,
1,5,6,4),
scale1
=
c(.
94,.
94,.
94,1.5,
1.29,15.42,3.13,1.28,5.17,
10.34,­
2.84,2.74,5.82),
scale2
=
c(
1,1,1,1,
1,1,1,1,1,2,2,2,2))

#
Study
codes
#
1
is
New
Zealand
(
4
endpoints)
#
2
is
Seychelles
(
5
endpoints)
#
3
is
Faroes
(
4
endpoints)

#
Endpoint
codes:
#
1
is
fullscale
IQ
(
all
three
studies)
#
2
is
performance
IQ
(
WISC_
RP
in
NZ)
#
3
is
MCC_
PP
(
McCarthy
perceptual
performance)
#
4
is
CVLT­
short
delay
(
Seychelles,
Faroes)
#
5
is
Bender
Visual
(
Faroes)
#
6
is
BNT
total
(
Seychelles,
Faroes)
#
7
is
WRAML
design
(
Seychelles)
#
8
is
VMI
(
Seychelles)
#
9
is
TOLD­
SL
(
New
Zealand)

#
The
vector
"
scale1"
corresponds
to
15
divides
by
the
observed
standard
deviation
of
the
#
corresponding
endpoint.
This
is
value
that
b
needs
to
be
multiplied
by
in
order
to
have
the
#
dose
effect
reflect
an
sd
of
15.
#
The
vector
"
scale2"
converts
the
results
from
cordblood
to
hair
mehg.
52
Model
file
model{
for
(
i
in
1:
nrow)
{
#
create
the
scaling
variable
and
scale
the
endpoint
specific
dose
#
response
estimates
and
standard
errors
scale[
i]
<­
scale1[
i]/
scale2[
i]
#
y[
i]
<­
b[
i]*
scale[
i]
p.
y[
i]
<­
1/(
scale[
i]*
scale[
i]*
pow(
bse[
i],
2))

y[
i]
~
dnorm(
mu[
i],
p.
y[
i])
mu[
i]
<­
beta1[
study[
i]]
+
beta2[
endpoint[
i]]
}

for
(
i
in
1:
nstudy)
{
beta1[
i]
~
dnorm(
0,
p.
study)
}

for
(
i
in
1:
nendpoint)
{
beta2[
i]
~
dnorm(
beta0,
p.
endpoint)
}

#
flat
prior
on
overall
mean
beta0
~
dnorm(
0,
.0001)
#
flat
prior
on
precision
of
endpoint
variance
component
p.
endpoint
~
dgamma(.
00001,.
00001)
p.
study
<­
p.
endpoint/
R
#
change
ratio
here
sig.
study
<­
1
/
sqrt(
p.
study)
sig.
endpoint
<­
1
/
sqrt(
p.
endpoint)
#
specify
R
R
<­
2
}
