1
Documentation
of
Peer
Review
for
EPA's
Estimation
of
an
Integrated
Dose­
Response
Function
for
Intelligence
Quotient
(
IQ)
and
Prenatal
Exposure
to
Mercury
U.
S.
Environmental
Protection
Agency
National
Center
for
Environmental
Economics
March
2005
EPA's
analysis
of
the
dose­
response
relationship
between
prenatal
mercury
exposure
and
childhood
IQ
involves
a
statistical
integration
of
data
from
three
epidemiological
studies
conducted
in
the
Faroe
Islands,
New
Zealand,
and
the
Seychelles
Islands.
The
analysis
consists
of
the
following
three
documents:

 
Neurobehavioral
Assessments
Conducted
in
the
New
Zealand,
Faroe
Islands,
and
Seychelles
Islands
Studies
of
Methylmercury
Neurotoxicity
in
Children,
by
David
C.
Bellinger,
March
2005.
 
Effects
of
Prenatal
Methylmercury
on
Childhood
IQ:
A
Synthesis
of
Three
Studies,
by
Louise
M.
Ryan,
March
2005.
 
Adverse
mercury
effects
in
7­
year
old
children
expressed
as
loss
in
"
IQ",
by
Esben
Budtz­
Jorgensen
et
al.,
March
2005.

Drafts
of
these
reports
were
peer
reviewed
in
accordance
with
EPA's
Peer
Review
Handbook,
2nd
Edition.
EPA
contracted
with
Industrial
Economics,
Incorporated
(
IEc)
to
coordinate
the
peer
review.
IEc
identified
the
peer
reviewers
and
provided
the
reviewers
with
the
documents
to
be
reviewed,
along
with
a
set
of
charge
questions
and
background
documents.
IEc
was
the
recipient
of
the
written
comments,
and
subsequently
provided
these
to
EPA.
There
was
no
direct
contact
between
EPA
or
the
authors
of
the
three
reports
being
reviewed
and
any
of
the
peer
reviewers.
The
peer
reviewers
selected
by
IEc
were:

 
Dr.
Thomas
A.
Burke,
Johns
Hopkins
University,
Bloomberg
School
of
Public
Health
 
Dr.
John
Bailar,
University
of
Chicago
(
emeritus)
 
Dr.
David
B.
Dunson,
National
Institutes
for
Environmental
Health
Sciences,
Biostatistics
Branch
 
Dr.
Joseph
Jacobson,
Wayne
State
University,
Department
of
Psychology
In
addition
to
drafts
of
the
three
reports,
the
reviewers
were
provided
several
critical
papers
as
background.
These
papers
were:
2
 
Budtz­
Jorgensen
E,
Keiding
N,
Grandjean
P
(
2001).
Benchmark
dose
calculation
from
epidemiological
data.
Biometrics,
57:
698­
706.

 
Budtz­
Jorgensen
E,
Keiding
N,
Grandjean
P,
Weihe
P
(
2002).
Estimation
of
health
effects
of
prenatal
methylmercury
exposure
using
structural
equation
models.
Environmental
Health.
14;
1(
1):
2.

 
Coull
BA,
Mezzetti
M,
Ryan
LM.
(
2003).
A
Bayesian
hierarchical
model
for
risk
assessment
of
methylmercury.
J
Agricultural,
Biological
&
Environmental
Statistics,
8:
253­
270.

 
Crump
KS,
Kjellstrom
T,
Shipp
AM,
Silvers
A,
Stewart
A.
(
1998).
Influence
of
prenatal
mercury
exposure
upon
scholastic
and
psychological
test
performance:
Benchmark
analysis
of
a
New
Zealand
cohort.
Risk
Analysis,
18:
701­
713.

 
Grandjean
P,
Weihe
P,
White
RF,
Debes
F,
Araki
S,
Yokoyama
K,
Murata
K,
Sorensen
N,
Dahl
R,
Jorgensen
PJ.
(
1997).
Cognitive
deficit
in
7­
year­
old
children
with
prenatal
exposure
to
methylmercury.
Neurotoxic
&
Teratol,
19:
417­
428.

 
Myers
GJ,
Davidson
PW,
Cox
C,
Shamlaye
CF,
Palumbo
D,
Cernichiari
E,
Sloane­
Reeves
J,
Wilding
GE,
Kost
J,
Huang
LS,
Clarkson
TW.
(
2003).
Prenatal
methylmercury
exposure
from
ocean
fish
consumption
in
the
Seychelles
child
development
study.
Lancet,
361:
1686­
1692.

 
Kjellstrom
T,
Kennedy
P,
Wallis
S,
Stewart
A,
Friberg
L,
Lind
B,
Wutherspoon
T,
Mantell
C.
(
1989).
Physical
and
mental
development
of
children
with
prenatal
exposure
to
mercury
from
fish.
Stage
2:
interviews
and
psychological
tests
at
age
6.
Solna,
Sweden:
National
Swedish
Environmental
Board
Report
3642.
[
selected
pages]

 
Myers
GJ,
Davidson
PW,
Cox
C,
Shamlaye
CF,
Palumbo
D,
Cernichiari
E,
Sloane­
Reeves
J,
Wilding
GE,
Kost
J,
Huang
LS,
Clarkson
TW.
(
2003).
Prenatal
methylmercury
exposure
from
ocean
fish
consumption
in
the
Seychelles
child
development
study.
Lancet,
361:
1686­
1692.

The
reviewers
were
also
provided
with
the
following
set
of
questions
to
focus
their
reviews.

General
Topics
 
Please
comment
on
the
robustness
of
the
methods,
models
and
data
presented
and
used
in
this
research.
3
 
Does
the
analysis
incorporate
all
relevant
studies?

 
Given
the
scope
and
intended
purpose
of
the
methodology,
are
the
analytical
framework,
assumptions,
and
application
of
data
appropriate?
Are
the
scientific
uncertainties
clearly
identified
and
characterized
throughout
the
analysis?

 
Are
the
methods
applied
in
this
study
appropriate
for
quantifying
IQ
decrements
in
the
range
of
current
U.
S.
mercury
exposures?
If
not,
what
methods
would
you
recommend?

 
What
are
the
overall
major
strengths
and
weaknesses
of
this
analysis?

 
Are
all
of
the
essential
elements
included
in
the
final
report?
Is
the
report
clear
and
well­
written?
What
additional
documentation,
if
any,
do
you
feel
is
needed
to
ensure
transparency?

Specific
Topics
 
This
analysis
focuses
on
IQ
as
the
neurodevelopmental
outcome
for
mercury
economic
benefits
analysis.
Are
there
other
neurodevelopmental
endpoints
for
mercury
that
could
be
quantified?
Are
there
advantages
and
disadvantages
of
using
IQ
to
represent
neurodevelopmental
effects
of
prenatal
mercury
exposure
in
addition
to
those
identified?

 
The
approach
to
quantifying
the
IQ
dose­
response
relationship
integrates
data
from
studies
conducted
in
the
Faroes,
Seychelles
and
New
Zealand.
Is
it
appropriate
to
combine
results
from
these
three
studies
for
this
analysis?
Do
differences
in
the
version
of
the
IQ
test
(
Wechsler
Intelligence
Scales
for
Children,
or
WISC)
administered
in
the
three
studies
raise
any
issues
that
are
of
concern
to
you?

 
The
authors
of
this
analysis
had
to
select
dose­
response
coefficients
from
the
three
studies
for
use
in
their
statistical
modeling.
Please
comment
on
the
choice
of
dose­
response
coefficients
from
the
three
studies
for
use
in
this
analysis.
Are
there
alternate
coefficients
or
other
data
from
these
three
studies
that
should
be
used
in
place
or,
or
in
addition
to,
those
used
in
this
analysis?

 
This
analysis
generally
relies
on
coefficients
from
linear
dose­
response
models.
Given
the
available
information,
is
this
an
appropriate
approach,
and
applicable
to
the
full
range
of
exposures
experienced
by
the
U.
S.
population,
including
exposures
below
those
in
the
range
of
empirical
observation?
4
 
Full­
scale
IQ
was
not
measured
in
the
Faroe
Islands
study;
however,
three
IQ
subtests
were
conducted.
At
the
request
of
EPA,
the
Faroe
Islands
research
team
conducted
a
statistical
analysis
to
estimate
a
dose­
response
relationship.
Is
the
rationale
for
extrapolating
full­
scale
IQ
from
the
three
subtests
clearly
explained
and
justified?
Is
the
approach
to
estimating
a
full­
scale
IQ
dose­
response
relationship
for
the
Faroe
Islands
appropriate?

 
Integrated
dose­
response
coefficients
are
estimated
first
using
only
the
IQ
dose­
response
coefficient
from
the
three
studies,
then
also
incorporating
data
for
other
neurodevelopmental
endpoints
(
e.
g.,
Boston
Naming
Test)
in
a
Bayesian
hierarchical
model.
Is
the
rationale
for
each
approach
clearly
explained?

 
Only
certain
non­
IQ
neurodevelopmental
endpoints
reported
in
the
Faroes,
New
Zealand
and
Seychelles
studies
could
be
included
in
the
Bayesian
model.
Is
the
rationale
for
selection
of
endpoints
for
inclusion/
exclusion
in
the
Bayesian
model
clear
and
reasonable?
Do
you
agree
with
the
rationale?

 
In
order
to
combine
data
from
different
studies
and
different
neurodevelopmental
endpoints,
it
was
necessary
to
rescale
the
reported
dose­
response
coefficients.
Data
from
the
Faroes
is
converted
from
terms
of
cord
blood
mercury
to
hair
mercury.
All
endpoints
other
than
IQ
are
converted
to
the
IQ
scale.
Is
the
rescaling
of
the
coefficients
clearly
explained
and
appropriately
executed?

 
Please
comment
on
the
implementation
of
the
Bayesian
model
and
interpretation
of
the
results
of
this
model.
Is
this
portion
of
the
study
clearly
explained,
and
appropriately
executed
and
interpreted?
Has
all
relevant
information
been
considered
in
the
model?

The
remainder
of
this
document
presents
the
comments
of
the
reviewers
and
EPA's
responses.
5
U.
S.
EPA
Mercury
IQ
Dose­
Response
Analysis:
Peer
Review
Comments
and
Responses
Reviewer
1
Review
of
the
USEPA
funded
analysis
of
methylmercury
dose­
response
Submitted
by:
Thomas
A.
Burke,
Ph.
D.,
M.
P.
H.
Johns
Hopkins
University
February
13,
2005
General
Comments
This
review
examines
the
following
three
reports:
1)
Effects
of
Prenatal
Methylmercury
on
Childhood
IQ:
A
Synthesis
of
Three
Studies,
by
Louise
M.
Ryan,
December
2004;
Neurobehavioral
Assessments
Conducted
in
the
New
Zealand,
Faroe
Islands,
and
Seychelles
Islands
Studies
of
Methylmercury
Neurotoxicity
in
Children,
by
David
C.
Bellinger,
December
2004;
and
Adverse
mercury
effects
in
7­
year
old
children
expressed
as
loss
in
"
IQ",
by
Esben
Budtz­
Jorgensen
et
al.,
draft
report
of
December
3,
2004.
The
overall
goal
is
to
provide
feedback
on
the
utility
of
the
papers
and
appropriateness
of
their
methods
and
findings
to
support
the
efforts
of
EPA
to
estimate
the
public
health
impacts
of
maternal
exposure
to
methylmercury
and
to
conduct
a
benefits
assessment.

The
review
has
been
challenging,
since
there
is
no
single
unifying
document
that
unifies
the
goals,
methods
and
findings
of
each
component.
Each
of
the
included
papers
provides
unique,
albeit
complementary,
perspectives.
However,
the
separation
of
the
components
into
the
three
distinct
papers
leads
to
a
degree
of
fragmentation
and
repetition
of
limitations
and
uncertainties
that
detracts
from
the
strength
of
the
methods
and
findings.
In
addition,
the
complexity
of
the
computational
methods
contributes
to
a
lack
of
clarity
in
the
Ryan
and
Faroe
team
papers.
It
is
therefore
recommended
that
a
short
synthesis
document
be
developed
to
explain
the
specific
goals,
findings
and
conclusions
of
each
paper,
and
characterizes
the
weight
of
evidence,
biological
basis,
and
adequacy
of
the
methods
used
to
measure
the
dose­
response
relationship
between
maternal
exposure
and
childhood
IQ.

RESPONSE:
Text
that
summarizes
and
synthesizes
the
overall
analysis
has
been
prepared
and
is
included
in
EPA's
benefit­
cost
analysis
document.

Each
of
the
papers
are
very
well
done,
and
taken
together
provide
a
complementary
examination
of
the
weight
of
the
evidence
for
using
the
childhood
IQ
in
the
assessment
of
the
health
impacts
of
mercury
exposure.
They
also
provide
support
for
conclusions
in
the
Bellinger
paper
that
"
it
is
possible
to
estimate
Full­
Scale
IQ
score
from
the
scores
on
the
subtests
that
were
administered
in
the
Faroe
Islands".
Taken
collectively
they
also
provide
support
for
the
conclusion
of
the
Ryan
paper
that
"
prenatal
exposure
to
MeHg
6
results
in
a
significant
decrease
in
full
scale
IQ".
The
models
presented
in
the
Ryan
paper
include
many
outcomes
in
addition
to
IQ.
This
integrated
analysis
of
the
results
from
the
three
major
studies
provides
many
new
insights
into
common
findings
among
the
studies
and
strengthens
the
overall
weight
of
the
scientific
evidence
of
adverse
neurodevelopmental
effects
of
prenatal
mercury
exposures.

There
results
of
the
papers
must
be
interpreted
with
some
caution.
There
are
numerous
limitations
to
these
analyses
which
are
described
in
each
paper.
In
addition,
the
magnitude
of
the
IQ
effect
is
small
(­
0.1
to
­
0.25
IQ
points
per
1ug/
g
increase
in
maternal
hair
mercury)
and
the
relevance
to
the
lower
U.
S.
population
exposure
levels
will
be
debated.
In
light
of
the
statement
of
Bellinger
that
"
Full­
Scale
IQ
may
not
be
the
cognitive
endpoint
that
is
most
sensitive
to
prenatal
MeHg
exposure"
it
is
possible
that
basing
the
benefits
assessment
on
IQ
effects
may
underestimate
the
true
population
public
health
impact.

RESPONSE:
We
have
added
a
brief
discussion
on
this
point
and
also
describe
how
one
could
alternatively
use
the
estimate
based
on
the
cognition/
achievement
domain,
rather
than
focusing
just
on
one
endpoint
(
IQ).

The
fundamental
question
for
the
benefits
analysis
may
well
be
"
Is
IQ
the
critical
endpoint
that
should
drive
the
benefits
analysis?"
To
address
this
question
it
is
recommended
that
the
report
include
a
stronger
explanation
of
the
rationale
for
selection
of
IQ
as
the
critical
endpoint
for
the
benefits
analysis.
This
examination
should
include
the
feasibility
of
including
other
neurodevelopmental
endpoints,
and
also
examine
the
emerging
evidence
of
the
relationship
between
Hg
exposure
and
cardiovascular
effects
for
the
total
population.
Cardiovascular
disease
is
the
nation's
leading
cause
of
mortality.
Although
the
science
is
still
in
the
formative
stages,
an
examination
of
the
potential
public
health
impacts
of
mercury
related
cardiovascular
risk
should
be
included
in
the
EPA
benefits
assessment.

RESPONSE:
Further
rationale
for
using
IQ
has
been
added,
along
with
discussion
of
possible
approaches
for
using
other
neurodevelopmental
endpoints.
Cardiovascular
effects
are
outside
the
scope
of
this
effort.

Charge
Questions
General
Topics
Please
comment
on
the
robustness
of
the
methods,
models
and
data
presented
and
used
in
this
research.

The
data
utilized
for
this
research
have
been
extensively
reviewed,
and
are
drawn
from
published
and
carefully
reviewed
studies.
The
integrated
analysis
of
Ryan
includes
a
broad
range
of
endpoints
and
included
sensitivity
analyses
to
demonstrate
the
robustness
of
the
modeling
approaches.
While
there
are
inherent
uncertainties
in
the
models,
including
issues
such
as
low­
dose
effects
or
thresholds,
the
methods
have
been
applied
in
7
accordance
with
generally
accepted
assumptions.
The
authors
have
included
explanations
of
the
strengths
and
weaknesses.

Does
the
analysis
incorporate
all
relevant
studies?

The
analysis
is
based
upon
the
three
key
epidemiological
studies
of
maternal
exposures
to
mercury
and
childhood
neurodevelopmental
effects,
with
a
focus
on
the
Faroe
Islands
study.
While
there
are
many
studies
of
the
health
impacts
of
mercury
these
are
most
relevant
to
assessing
potential
neurodevelopmental
impacts.
This
approach
is
consistent
with
the
NAS
Committee
on
the
Toxicological
Effects
of
Methylmercury.

Given
the
scope
and
intended
purpose
of
the
methodology,
are
the
analytical
framework,
assumptions,
and
application
of
data
appropriate?
Are
the
scientific
uncertainties
clearly
identified
and
characterized
throughout
the
analysis?

In
each
paper
the
authors
provide
a
detailed
discussion
of
the
assumptions
of
the
models,
and
their
potential
limitations.
They
also
discuss
the
limitations
of
the
data,
including
a
thorough
examination
of
the
inter­
study
differences
in
test
measures
and
exposure
measures.
The
analyses
could
be
strengthened
if
a
more
detailed
discussion
of
the
biological
basis
for
assumptions
used
in
the
models,
as
written
the
methods
focus
on
the
computational
basis
more
than
the
neurological
rationale.

RESPONSE:
Discussion
of
the
neurological
rationale
has
been
expanded
in
the
report
by
Bellinger.
Text
in
the
report
by
Ryan
on
rescaling
of
data
has
been
expanded.

Since
the
benefits
assessment
has
important
regulatory
implications
it
is
essential
to
include
a
thorough
explanation
of
these
uncertainties.
It
may
be
important
for
EPA
to
examine
the
collective
uncertainties
in
each
step
and
provide
approaches
for
their
consideration
in
the
assessment.

RESPONSE:
Discussion
of
uncertainties
in
the
report
by
Ryan
has
been
expanded.

Are
the
methods
applied
in
this
study
appropriate
for
quantifying
IQ
decrements
in
the
range
of
current
U.
S.
mercury
exposures?
If
not,
what
methods
would
you
recommend?

According
to
NHANES
hair
measurements,
the
mean
population
exposure
levels
for
U.
S.
women
are
20
times
lower
that
the
mean
in
the
Faroe
study
(
4.3
vs.
0.2
ppm).
Ryan
estimates
a
­
0.1
to
­
0.25
IQ
point
decrease
for
every
increase
of
1ppm
in
hair.
Even
a
doubling
of
the
population
mean
level
would
have
a
very
small
effect
on
individual
level
IQ
scores.
While
the
study
methods
may
be
appropriate
for
the
highly
exposed
U.
S.
women,
they
may
not
be
appropriate
to
estimate
true
population
impacts
on
IQ.
In
light
of
the
many
other
neurological
endpoints
it
may
be
more
appropriate
to
develop
an
alternative
neurodevelopmental
index.
Since
this
is
not
my
area
of
expertise
I
cannot
recommend
methods.
8
RESPONSE:
The
modeling
approach
used
in
this
analysis
can
provide
a
coefficient
for
the
overall
cognitive/
achievement
domain,
and
other
model
outputs
may
be
informative
as
to
broader
neurodevelopmental
impacts.
It
is
not
clear,
however,
how
such
outputs
could
be
used
in
quantitative
benefit­
cost
analysis
at
this
time,
though
this
may
be
feasible
with
further
economic
research.
Discussion
of
these
points
has
been
added.

What
are
the
overall
major
strengths
and
weaknesses
of
this
analysis?

This
is
discussed
in
the
initial
general
comments
section.
The
analysis
is
very
thorough
and
strong,
but
IQ
may
not
be
the
most
appropriate
measure
of
population
public
health
effects
on
which
to
base
a
benefits
assessment.
There
are
a
number
of
uncertainties
that
each
of
the
authors
point
out
and
there
will
be
debate
concerning
the
relevance
of
the
findings
to
the
U.
S.
population.
Perhaps
most
importantly,
IQ
alone
may
underestimate
the
broader
public
health
effects
of
population
mercury
exposures.

RESPONSE:
The
discussion
of
the
limitations
of
using
IQ
has
been
expanded.

Are
all
of
the
essential
elements
included
in
the
final
report?
Is
the
report
clear
and
well­
written?
What
additional
documentation,
if
any,
do
you
feel
is
needed
to
ensure
transparency?

As
mentioned
in
the
general
comments
section,
the
papers
should
somehow
be
pulled
together
through
a
summary
document
that
clarifies
the
goals,
findings,
and
implications
of
each
component.
Each
author
should
also
provide
a
more
complete
statistical
and
biological
explanation
for
assumptions
that
have
been
applied
in
selecting
or
eliminating
variables.
The
conclusions
of
each
paper
should
be
presented
more
clearly,
including
a
discussion
of
population
public
health
implications.
While
the
papers
are
well
written,
they
address
very
complex
issues
and
methods
and
are
difficult
to
read
and
it
is
challenging
to
recognize
their
important
interconnections.

RESPONSE:
The
documents
have
been
revised
as
suggested.

Specific
Topics
This
analysis
focuses
on
IQ
as
the
neurodevelopmental
outcome
for
mercury
economic
benefits
analysis.
Are
there
other
neurodevelopmental
endpoints
for
mercury
that
could
be
quantified?
Are
there
advantages
and
disadvantages
of
using
IQ
to
represent
neurodevelopmental
effects
of
prenatal
mercury
exposure
in
addition
to
those
identified?

The
major
disadvantage
is
described
in
the
Belllinger
paper,
that
is
IQ
may
not
be
the
most
sensitive
cognitive
endpoint.
Therefore
the
disadvantage
of
using
IQ
to
represent
neurodevelopmental
effects
is
that
it
may
underestimate
true
population
impacts.
Other
potentially
quantifiable
outcomes
should
be
examined
and
included,
perhaps
using
9
methods
of
risk
assessment
to
address
uncertainties.

RESPONSE:
See
responses
above.

The
approach
to
quantifying
the
IQ
dose­
response
relationship
integrates
data
from
studies
conducted
in
the
Faroes,
Seychelles
and
New
Zealand.
Is
it
appropriate
to
combine
results
from
these
three
studies
for
this
analysis?
Do
differences
in
the
version
of
the
IQ
test
(
Wechsler
Intelligence
Scales
for
Children,
or
WISC)
administered
in
the
three
studies
raise
any
issues
that
are
of
concern
to
you?

The
integration
of
the
data
from
the
three
major
studies
is
appropriate
and
provides
important
insights
into
the
overall
similarities
of
effects.
It
also
contributes
to
the
weight
of
the
evidence
of
the
association
between
maternal
exposures
and
childhood
neurodevelopment.
The
combination
of
these
data
is
appropriate,
and
the
author
takes
great
care
in
describing
similarities
and
differences,
and
potential
sources
of
uncertainty.
All
three
papers
address
the
relevance
and
potential
limitations
of
combining
the
results,
including
the
differing
measures
of
IQ.
This
is
not
my
specific
area
of
expertise,
however
I
have
consulted
colleagues
about
this
issue
and
they
have
reaffirmed
the
findings
of
the
authors
that
the
combinations
are
appropriate.

The
differences
in
testing
will
undoubtedly
be
a
major
focus
of
policy
debate.
It
is
unfortunate
that
the
Faroe
study,
the
critical
study
for
the
development
of
the
RfD,
has
the
most
limited
measures
for
quantifying
IQ.

The
authors
of
this
analysis
had
to
select
dose­
response
coefficients
from
the
three
studies
for
use
in
their
statistical
modeling.
Please
comment
on
the
choice
of
doseresponse
coefficients
from
the
three
studies
for
use
in
this
analysis.
Are
there
alternate
coefficients
or
other
data
from
these
three
studies
that
should
be
used
in
place
or,
or
in
addition
to,
those
used
in
this
analysis?

Ryan
does
a
good
job
of
describing
the
selection
of
the
doe­
response
coefficients
from
a
computational
perspective,
the
neurological
basis
for
these
selections
are
not
addressed.
In
general
I
do
not
feel
qualified
to
address
this
question.

RESPONSE:
The
neurological
effects
of
mercury
are
not
sufficiently
well
understood
at
a
mechanistic
level
to
provide
a
basis
for
selecting
tests.
All
three
studies
employed
a
broad
range
of
tests
because
they
were
not
exactly
sure
what
it
was
they
would
find.
We
focused
our
analyses
on
the
test
score
for
which
the
most
data
are
available
regarding
long­
term
implications
(
i.
e.,
IQ).
Since
this
is
a
cognitive
outcome,
we
chose
results
for
other
cognitive
tests
for
inclusion
in
the
Bayesian
model
for
our
main
analysis;
the
tests
selected
would
be
widely
recognized
as
representing
neurological
outcomes
in
the
cognitive
domain;
rationale
for
this
grouping
is
provided
in
the
text.
10
This
analysis
generally
relies
on
coefficients
from
linear
dose­
response
models.
Given
the
available
information,
is
this
an
appropriate
approach,
and
applicable
to
the
full
range
of
exposures
experienced
by
the
U.
S.
population,
including
exposures
below
those
in
the
range
of
empirical
observation?

This
may
be
more
a
policy
question
than
a
scientific
one.
There
are
inherent
uncertainties
at
low
dose
levels.
However,
the
neurodevelopmental
endpoints
are
continuous
measures
and
there
is
no
current
recognition
of
a
threshold
level
for
MeHg.
If
the
goal
of
EPA
is
to
be
protective
of
public
health
the
use
of
coefficients
from
linear
dose­
response
models
is
appropriate
as
long
as
the
uncertainties
and
policy
judgments
are
documented.

Full
scale
IQ
was
not
measured
in
the
Faroe
Islands
study;
however,
three
IQ
subtests
were
conducted.
At
the
request
of
EPA,
the
Faroe
Islands
research
team
conducted
a
statistical
analysis
to
estimate
a
dose­
response
relationship.
Is
the
rationale
for
extrapolating
full
scale
IQ
from
the
three
subtests
clearly
explained
and
justified?
Is
the
approach
to
estimating
a
full­
scale
IQ
dose­
response
relationship
for
the
Faroe
Islands
appropriate?

The
discussion
section
of
the
Faroes
research
team
paper
should
be
clarified
to
more
clearly
describe
the
implications
of
their
findings
for
estimating
Full­
Scale
IQ
from
the
subtests.
This
paper
should
answer
these
questions,
yet
as
currently
written
the
conclusions
of
the
authors
are
unclear
and
provide
only
marginal
support
for
the
Ryan
modeling.
In
general
I
feel
the
Bellinger
and
Ryan
papers
provide
a
good
examination
of
the
issues
that
justifies
the
use
of
the
subtests
as
estimates.

RESPONSE:
The
comment
has
been
shared
with
the
Faroes
research
team.

Integrated
dose­
response
coefficients
are
estimated
first
using
only
the
IQ
dose­
response
coefficient
from
the
three
studies,
then
also
incorporating
data
for
other
neurodevelopmental
endpoints
(
e.
g.,
Boston
Naming
Test)
in
a
Bayesian
hierarchical
model.
Is
the
rationale
for
each
approach
clearly
explained?

The
rational
is
well
described,
and
the
results
provide
important
evidence
of
the
comparability
of
the
studies
across
a
broad
range
of
endpoints.
However
this
approach
is
intended
as
a
computational
tool
and
the
underlying
scientific
basis
should
be
further
explained.
With
further
development
this
approach
may
have
applicability
for
the
development
of
quantifiable
estimates
of
indicators
of
population
neurodevelopment
impacts
beyond
IQ.
One
suggestion,
this
paper
could
benefit
from
editing
to
improve
the
clarity
for
readers
with
limited
expertise
in
Bayesian
models.

RESPONSE:
The
text
on
the
Bayesian
model
has
been
revised
to
make
it
clearer.
Other
text
has
been
added
to
further
describe
the
limitations
of
focusing
11
on
IQ
and
discussing
other
model
outputs
that
quantify
impacts
beyond
IQ.

Only
certain
non­
IQ
neurodevelopmental
endpoints
reported
in
the
Faroes,
New
Zealand
and
Seychelles
studies
could
be
included
in
the
Bayesian
model.
Is
the
rationale
for
selection
of
endpoints
for
inclusion/
exclusion
in
the
Bayesian
model
clear
and
reasonable?
Do
you
agree
with
the
rationale?

This
question
is
more
appropriately
answered
by
those
with
expertise
in
neurodevelopmental
testing
and
Bayesian
statistics.
It
may
be
appropriate
to
conduct
further
sensitivity
analysis
to
examine
the
influence
of
the
inclusion/
exclusion
decisions.

RESPONSE:
Sensitivity
analysis
has
been
added
to
assess
the
effect
of
adding
in
endpoints
that
were
excluded
from
the
primary
analysis.

In
order
to
combine
data
from
different
studies
and
different
neurodevelomental
endpoints,
it
was
necessary
to
rescale
the
reported
dose­
response
coefficients.
Data
from
the
Faroes
is
converted
from
terms
of
cord
blood
mercury
to
hair
mercury.
All
endpoints
other
than
IQ
are
converted
to
the
IQ
scale.
Is
the
rescaling
of
the
coefficients
clearly
explained
and
appropriately
executed?
This
is
an
important
section
of
the
analysis
that
could
provide
important
support
for
the
overall
weight
of
evidence
of
neurodevelopmental
effects
of
MeHg.
There
is
well
documented
data
supporting
the
conversion
of
cord
blood
to
hair
mercury.
However,
this
section
should
provide
further
discussion
of
the
scientific
(
biological
and
public
health)
basis
for
endpoint
rescaling.

RESPONSE:
Our
use
of
rescaled
estimates
is
analogous
to
the
use
of
"
effect
sizes,"
a
well­
established
way
to
think
about
comparability
of
dose
response
coefficients
across
different
scales.
We
have
added
discussion
to
this
effect
in
the
conclusion
section.
A
substantial
portion
of
the
conversion
is
a
simple
mathematical
conversion
of
a
number
expressed
on
one
scale
to
an
equivalent
value
on
a
different
scale
 
similar
to
converting
a
temperature
from
the
Fahrenheit
scale
to
the
Celsius
scale.
Another
aspect
of
rescaling,
which
applies
only
to
the
Faroes,
is
converting
from
cord
blood
units
to
hair
units.
This
rescaling
is
discussed
in
the
text
in
some
detail.

Please
comment
on
the
implementation
of
the
Bayesian
model
and
interpretation
of
the
results
of
this
model.
Is
this
portion
of
the
study
clearly
explained,
and
appropriately
executed
and
interpreted?
Has
all
relevant
information
been
considered
in
the
model?

My
expertise
in
this
area
is
quite
limited.
The
Bayesian
approach
provides
a
robust
tool
for
examining
the
results
across
multiple
endpoints
and
studies.
That
being
said,
the
conclusions
section
of
the
paper
needs
to
be
strengthened
if
the
results
are
to
provide
a
useful
and
defensible
tool
for
the
benefits
analysis.
Throughout
the
three
papers
there
are
12
multiple
messages
concerning
uncertainty,
yet
the
strategy
lacks
any
statement
addressing
the
point
at
which
the
uncertainty
becomes
unacceptable
for
the
benefits
assessment.
Perhaps
this
issue
should
be
addressed
in
the
suggested
overview
document
that
would
serve
to
tie
the
research
together
and
form
the
basis
for
EPA
decision
making
on
IQ
and
maternal
mercury
exposure.

RESPONSE:
This
is
addressed
by
the
overview
and
synthesis
text
described
above.

Reviewer
2
Review
of
Effects
of
Prenatal
Methylmercury
on
Childhood
IQ:
A
Synthesis
of
Three
Studies;
Neurobehavioral
Assessments
Conducted
in
the
New
Zealand,
Faroe
Islands,
and
Seychelles
Islands
Studies
of
Methylmercury
Neurotoxicity
in
Children;
and
Adverse
mercury
effects
in
7­
year
old
children
expressed
as
loss
in
"
IQ"
Review
prepared
by
Dr.
David
B.
Dunson
Tenured
Senior
Investigator,
Biostatistics
Branch,
NIEHS
Adjunct
Associate
Professor,
Institute
of
Statistics
and
Decision
Sciences,
Duke
University
Adjunct
Associate
Professor,
Department
of
Biostatistics,
Univ.
of
North
Carolina
at
Chapel
Hill
February
17,
2005
General
Comments:

My
comments
are
focused
on
the
meta
analysis
of
Professor
Ryan.
She
combined
data
from
three
epidemiologic
studies
(
Seychelles
child
development
study,
Faroe
islands
study,
and
New
Zealand
study)
to
assess
the
effects
of
prenatal
methylmercury
exposure
on
childhood
IQ.
The
statistical
analysis
is
closely
related
to
the
approach
described
by
Coull
et
al.
(
2003).
Although
the
focus
is
on
IQ,
Ryan
also
considered
related
endpoints,
arguing
convincingly
that
IQ
is
not
measured
perfectly
and
incorporating
related
outcomes
can
potentially
improve
estimates
of
dose
response
for
IQ.
In
fact,
such
an
idea
is
well
support
in
the
literature
on
seemingly
unrelated
regression.
I
may
add
that
intelligence
is
intrinsically
a
latent
variable
or
latent
trait,
so
it
may
be
overly
simplistic
and
misleading
to
use
a
single
measure
of
IQ.
Ideally,
one
would
use
structural
equation
models
(
SEMs),
a
widely
used
approach
in
psychometrics
and
social
sciences,
to
formally
allow
for
an
"
intelligence"
latent
trait,
which
is
measured
by
an
item
response
battery
and
may
be
related
to
other
latent
variables
(
poverty,
quality
of
childhood
education,
etc).
Of
course,
analyses
that
adjust
for
the
potential
confounding
effects
of
childhood
environment
would
require
detailed
data
on
other
factors.
13
Unfortunately,
Ryan
does
not
have
access
to
the
subject­
specific
data
in
the
studies
under
consideration,
and
hence
is
limited
in
the
flexibility
of
approaches
she
can
consider
to
methods,
which
combine
summary
statistics
for
the
different
studies.
Given
the
importance
of
these
results,
it
would
be
very
appealing
to
conduct
a
meta
analyses
using
all
the
data.
Without
this
information,
it
is
very
difficult
to
fully
account
for
uncertainty
and
to
evaluate
the
appropriateness
of
modeling
assumptions.
In
addition,
she
was
forced
to
select
out
those
endpoints
"
for
which
population
means
and
standard
deviations
were
either
provided
or
readily
available."
Certainly,
this
selection
may
result
in
a
loss
of
information
[
refer
to
bottom
of
page
4
of
her
report
for
the
list
of
endpoints
excluded].

Because,
the
different
endpoints
have
different
measurement
scales,
it
is
important
to
standardize
in
combining
the
endpoints
and
in
performing
quantitative
risk
assessment.
Ryan
uses
the
approach
of
expressing
the
exposure
effect
in
terms
of
a
percentage
of
the
standard
deviation
of
the
outcome,
which
seems
to
be
a
reasonable
strategy.
An
alternative
strategy
is
to
relate
the
different
endpoints
to
a
latent
variable,
which
is
on
some
pre­
defined
scale
(
e.
g.,
equivalent
to
the
scale
of
one
of
the
endpoints
or
set
equal
to
an
arbitrary
constant).
Such
an
approach
can
be
implemented
in
a
SEM
analysis.
An
SEM
analysis
of
the
Faroes
data
was
published
previously
by
Budtz­
Jorgensen
et
al.
(
2002;
2004).
In
combining
the
three
IQ
subscales,
Ryan
considers
a
simplified
SEM.
Although
the
Budtz­
Jorgensen
et
al.
(
2002)
SEM
seems
more
realistic,
the
Ryan
simplification
seems
appropriate
for
the
purposes
of
combining
information
and
simplifying
presentation
of
results.

Overall,
the
analysis
is
well
done
and
the
difficult
issues
involved
have
been
carefully
considered
and
dealt
with
adequately.
The
main
change
I
would
recommend
pertains
to
the
key
issue
of
combining
information
from
the
different
outcomes
and
studies
(
see
point
7
below).
In
particular,
facing
problems
with
the
approach
of
Coull
et
al.
(
2003)
for
vague
priors
due
to
the
small
number
of
studies,
Ryan
abandons
the
original
formulation,
which
was
conceptually
quite
appealing.
These
problems
likely
arose
due
to
the
use
of
vague
priors
for
the
parameters
 
certainly
the
data
do
not
dominant
the
prior
in
this
case,
so
it
is
not
surprising
the
bad
results
occur
when
choosing
an
unrealistic
prior.
For
example,
a
flat
prior
for
the
variance
components
may
effective
assign
most
of
its
mass
to
values
favoring
huge
variance
among
studies
in
the
regression
coefficients.
Actually,
even
if
the
quality
of
the
studies
varies
widely,
I
would
be
very
surprised
to
find
more
than
a
modest
amount
of
variability
in
the
coefficients
(
e.
g.,
changes
from
 
10
to
+
15
certainly
are
implausible).
It
should
be
easy
to
choose
a
range
of
plausible
values,
particularly
given
the
standardized
scale.

RESPONSE:
We
agree
with
the
commenter
regarding
the
specification
of
priors.
To
run
the
model,
it
was
necessary
to
provide
some
constraint
on
the
range
of
the
prior
distributions,
but
these
constraints
did
not
impose
any
strong
judgment
that
would
overly
influence
the
results;
the
approach
was
consistent
with
the
intent
of
using
vague
priors.

Specific
Comments:
14
1.
Step
3,
page
5:
Excluding
tests
and
endpoints
for
which
the
study
means
and
standard
deviations
were
substantially
different
from
the
population
norms
seems
questionable.
One
reason
for
the
discrepancies
may
be
a
difference
between
the
study
populations
and
the
general
population
with
respect
to
important
predictors,
possibly
including
the
level
of
methylmercury
exposure.
If
such
predictors
are
adjusted
for
in
the
analyses,
then
the
results
should
still
be
valid.
In
addition,
if
the
study
sample
had
a
different
exposure
distribution
compared
to
the
general
population
and
exposure
had
an
effect,
then
you
may
be
discarding
those
endpoints
that
are
most
sensitive
to
the
exposure
effect.
Another
reason
for
the
discrepancy
may
be
variation
in
the
administrator
of
the
test
 
neurobehavioral
assessments
can
be
notoriously
variable.
Perhaps
a
compromise
would
be
to
conduct
a
sensitivity
analyses
with
and
without
the
tests
in
question.

RESPONSE:
A
sensitivity
analysis
has
been
added
considering
the
effect
of
tests
which
were
excluded
from
the
primary
analysis.
A
sensitivity
analysis
that
looks
at
data
for
the
two
Faroes
Similarities
examiners
separately,
as
well
as
excluding
data
for
Similarities
entirely,
has
also
been
added.

2.
Step
4,
page
5:
An
alternative
to
arbitrarily
selecting
one
score
for
tests
with
multiple
variants
would
be
to
include
a
measurement
error
model.
For
example,
use
a
model
with
the
variants
randomly
distributed
about
a
test
score
latent
variable.
Such
models
can
be
easily
fitted
in
WinBUGS
using
a
Bayesian
approach.

RESPONSE:
This
is
a
good
idea.
However,
in
light
of
the
challenges
experienced
already
with
including
two
levels
in
our
model,
due
to
our
relatively
small
sample
sizes,
such
an
approach
was
not
practically
feasible
in
our
setting.
Discussion
has
been
added
to
the
report.

3.
Page
8,
scale
of
the
latent
IQ
variable.
By
assigning
a
normal
distribution
to
the
latent
IQ
variable
with
unit
variance,
one
can
use
the
regression
coefficients
to
obtain
easily
interpretable
quantities,
such
as
where
typical
individuals
at
any
given
dose
fall
in
the
population
distribution
for
unexposed
(
or
low
exposed)
individuals.
One
worry
with
these
types
of
analyses,
including
both
the
SEMs
and
the
rescaling
based
on
estimated
standard
deviations
approach,
is
the
reliance
on
the
idea
that
dose
results
in
a
shift
in
the
mean
of
a
normally
distributed
distribution
with
constant
variance.
Certainly,
heterogeneity
in
the
effect
of
exposure
may
lead
to
increasing
variance
at
higher
exposure
levels.
Inferences
on
the
proportions
of
individuals
in
the
tails
of
the
distribution
may
be
particularly
sensitive
to
this
type
of
violation
of
the
modeling
assumptions.

RESPONSE:
The
primary
intended
use
of
the
output
of
the
model
is
to
estimate
the
mean
IQ
decrement
in
an
exposed
population
for
any
given
level
of
exposure.
We
agree
that
these
issues
would
need
to
be
considered
if
the
results
were
extended
to
analyze
shifts
in
the
tails
of
the
distribution.
15
4.
Discussion
of
K­
power
model
and
linearity
assumption:
Given
the
clear
uncertainty
in
the
dose
response
shape
and
the
large
impact
assumptions
on
the
dose
response
curve
have
on
the
results
 
e.
g.,
estimated
dose
response,
benchmark
dose,
etc
 
the
focus
on
a
linear
model
is
unappealing.
The
K­
power
model
is
not
really
a
great
alternative
because
the
value
of
K
may
be
driven
primarily
by
data
outside
of
the
range
of
primary
interest.
For
example,
to
better
fit
the
curve
at
higher
exposure
levels
we
may
choose
K=
1,
which
may
be
a
poor
choice
at
lower
exposure
levels.
A
more
flexible
approach,
such
as
a
spline
seems
in
order.
A
monotone
regression
spline,
which
enforces
the
increasing
dose
response
curve
constraint,
would
be
appealing
in
estimating
the
BMD,
because
it
avoid
the
problem
of
obtaining
an
undefined
estimate
for
negative
or
zero
slopes.
A
simple
alternative
is
a
Bayesian
model
averaging
approach
 
for
example,
estimate
a
variety
of
dose
response
curves
and
assign
weights
based
on
a
BIC­
approximation
to
the
posterior
model
probabilities.

RESPONSE:
There
is
insufficient
data
to
explore
alternative
dose­
response
curves.
Some
discussion
has
been
added.

5.
Rescaling
of
methylmercury
levels
(
page
13):
Because
there
is
clearly
uncertainty
in
converting
between
hair
levels
and
cord
blood
levels,
it
would
be
appealing
to
formally
account
for
this
uncertainty
using
a
measurement
error
model
or
prior
distribution.
The
sensitivity
analysis
approach
used
here
and
elsewhere
(
i.
e.
repeat
analyses
plugging
in
different
plausible
values)
simplifies
the
presentation
and
implementation.
However,
a
worry
is
that
even
though
the
parameter
estimates
may
not
differ
hugely
across
the
different
analyses,
ignoring
the
associated
uncertainty
may
lead
to
underestimation
of
the
width
of
credible
intervals.
This
is
particularly
problematic
when
several
unknowns
are
treated
in
this
manner,
which
is
the
case
in
the
current
analysis.

RESPONSE:
Based
on
this
suggestion,
treatment
of
the
hair:
blood
ratio
as
a
random
variable
was
attempted.
This
was
less
easily
accomplished
in
the
structure
of
the
model
than
it
originally
appeared
 
in
part
because
this
conversion
applies
to
only
one
of
the
three
studies.
We
may
revisit
this
issue
at
a
later
time.
However,
the
extent
to
which
this
variable
is
uncertain
should
not
be
overemphasized,
as
the
ratio
used
is
specific
to
the
population
to
which
it
is
applied.
The
sensitivity
analysis
using
an
alternate
ratio
is
intended
to
illustrate
how
the
results
might
change,
but
the
case
for
using
the
ratio
of
200
in
preference
to
other
values
is
quite
strong.

6.
Hierarchical
model,
page
16:
This
model
seems
to
be
a
reasonable
and
simple
way
to
combine
information
across
the
endpoints
and
studies
under
consideration.
I
worry
a
bit
about
the
prior
specification.
Given
that
there
are
only
3
studies
under
consideration,
it
would
seem
difficult
to
estimate
the
degree
of
heterogeneity
among
studies
based
on
the
data
alone.
This
may
also
be
the
case
for
the
degree
of
heterogeneity
in
the
coefficients
for
different
outcomes.
It
seems
that
the
prior
chosen
may
have
an
important
impact
on
the
degree
of
shrinkage
of
the
coefficients
towards
the
averaged
coefficient.
I
also
worry
16
a
bit
about
mixing
and
convergence
of
the
MCMC
algorithm
in
WinBUGS,
particularly
if
diffuse
priors
are
chosen
for
the
variance
components.

RESPONSE:
Text
has
been
added
to
the
discussion
to
address
this
issue.

7.
The
discussion
on
page
19
shows
that
Ryan
is
well
aware
of
the
issues,
and
the
approach
she
proposed
seems
to
be
reasonable.
I
would
prefer
use
of
informative
prior
distributions.
It
seems
that
one
could
narrow
down
the
range
of
plausible
values
for
the
parameters
considerably
 
certainly
a
very
large
heterogeneity
among
studies
and
endpoints
in
the
regression
parameters
seems
unreasonable.
The
standardized
scale
helps
in
choosing
plausible
values
for
the
variance
components.
In
addition,
mixing
may
be
improved
by
a
hierarchical
centering
parameterization
or
even
by
the
commonly
used
ad
hoc
approach
of
centering
latent
variables
about
their
means
at
each
iteration
of
the
Gibbs
sampler.
The
issue
is
not
entirely
the
small
sample
size,
because
vague
inverse
gamma
priors
for
the
variance
components
have
well
known
problems
due
to
"
near
impropriety".
This
has
lead
to
alternative
vague
priors,
which
could
be
used
here.
However,
the
informative
prior
approach
 
possibly
even
plugging
in
an
intelligent
guess
at
the
variance
components
(
as
is
used
in
the
so­
called
semi­
Bayes
approach)
seems
more
reasonable.

Response
to
Questions:

General
Topics
 
Please
comment
on
the
robustness
of
the
methods,
models
and
data
presented
and
used
in
this
research.

Given
the
complexity
of
the
data
under
consideration,
it
is
very
difficult
to
fully
address
the
robustness
issue.
However,
Ryan
has
done
a
very
good
job
consider
a
variety
of
analyses
and
data
to
be
included.

 
Does
the
analysis
incorporate
all
relevant
studies?

Although
other
relevant
studies
could
potentially
be
included,
the
focus
on
the
3
studies
judged
to
be
of
adequate
size
and
quality
seems
reasonable.

 
Given
the
scope
and
intended
purpose
of
the
methodology,
are
the
analytical
framework,
assumptions,
and
application
of
data
appropriate?
Are
the
scientific
uncertainties
clearly
identified
and
characterized
throughout
the
analysis?

Overall
yes
 
see
additional
comments
above
 
Are
the
methods
applied
in
this
study
appropriate
for
quantifying
IQ
decrements
in
the
range
of
current
U.
S.
mercury
exposures?
If
not,
what
methods
would
you
recommend?
17
Overall
yes
 
there
are
many
simplifying
assumptions,
but
these
seem
reasonable
in
general.
I
would
recommend
changes
to
the
borrowing
of
information
approach
as
described
above.

RESPONSE:
See
responses
above.

 
What
are
the
overall
major
strengths
and
weaknesses
of
this
analysis?

The
strengths
of
the
analysis
include
the
careful
consideration
of
data
to
be
included
and
of
a
variety
of
approaches
for
dealing
with
the
complex
multivariate
outcomes
having
different
measurement
scales.
The
weaknesses
mainly
related
to
the
method
for
combining
information
as
described
above.

 
Are
all
of
the
essential
elements
included
in
the
final
report?
Is
the
report
clear
and
well
written?
What
additional
documentation,
if
any,
do
you
feel
is
needed
to
ensure
transparency?

The
report
is
very
well
written
and
thorough.
Many
of
my
suggestions
above
(
with
the
exception
of
the
approach
for
combining
information)
would
necessitate
a
more
complex
analysis
and
presentation,
both
of
which
may
decrease
the
clarity
of
the
presentation.

Specific
Topics
 
This
analysis
focuses
on
IQ
as
the
neurodevelopmental
outcome
for
mercury
economic
benefits
analysis.
Are
there
other
neurodevelopmental
endpoints
for
mercury
that
could
be
quantified?
Are
there
advantages
and
disadvantages
of
using
IQ
to
represent
neurodevelopmental
effects
of
prenatal
mercury
exposure
in
addition
to
those
identified?

Other
endpoints
were
considered
here.
Overall,
I
think
it
is
a
bad
idea
to
focus
sole
attention
on
IQ.
Though
public
interest
and
concern
may
be
less
for
other
neurobehavioral
outcomes,
it
would
be
very
appealing
to
consider
different
functional
domains
jointly
in
a
single
analysis.

RESPONSE:
We
agree
that
there
is
value
in
providing
more
quantitative
information
for
other
neurodevelopmental
endpoints
from
the
studies.
Providing
these
results
can
lead
to
expanded
qualitative
or
quantitative
treatment
of
these
endpoints
in
benefit­
cost
analyses.
As
an
example
of
additional
outputs
that
the
model
can
produce,
the
report
now
provides
a
coefficient
for
the
overall
cognitive/
achievement
domain,
which
can
be
interpreted
as
an
index
of
the
results
for
all
cognitive
endpoints
in
the
model.
We
will
explore
this
type
of
output
and
others
in
future
work.

 
The
approach
to
quantifying
the
IQ
dose­
response
relationship
integrates
data
from
studies
conducted
in
the
Faroes,
Seychelles
and
New
Zealand.
Is
it
appropriate
to
combine
results
from
these
three
studies
for
this
analysis?
Do
differences
in
the
18
version
of
the
IQ
test
(
Wechsler
Intelligence
Scales
for
Children,
or
WISC)
administered
in
the
three
studies
raise
any
issues
that
are
of
concern
to
you?

I
do
think
that
it
is
appropriate
and
necessary
to
combine
information
from
the
studies.
Differences
in
the
version
of
the
IQ
test
have
been
adequately
addressed
in
the
analysis.
These
differences
may
actually
be
an
advantage,
because
the
combined
results
may
be
less
sensitive
to
limitations
in
a
particular
version
of
the
test.

 
The
authors
of
this
analysis
had
to
select
dose­
response
coefficients
from
the
three
studies
for
use
in
their
statistical
modeling.
Please
comment
on
the
choice
of
doseresponse
coefficients
from
the
three
studies
for
use
in
this
analysis.
Are
there
alternate
coefficients
or
other
data
from
these
three
studies
that
should
be
used
in
place
or,
or
in
addition
to,
those
used
in
this
analysis?

As
I
mentioned
above,
I
find
it
somewhat
unappealing
to
focus
on
a
selected
subset
of
the
dose
response
coefficients.
However,
the
reasons
presented
for
doing
so
in
this
context
seem
justified.

RESPONSE:
The
original
stated
reasons
for
focusing
on
a
narrower
set
of
endpoints
in
the
main
analysis,
which
were
based
on
the
nature
of
the
data
available,
still
apply.
However,
as
noted
above,
sensitivity
analysis
has
been
added
to
incorporate
a
broader
selection
of
coefficients.

 
This
analysis
generally
relies
on
coefficients
from
linear
dose­
response
models.
Given
the
available
information,
is
this
an
appropriate
approach,
and
applicable
to
the
full
range
of
exposures
experienced
by
the
U.
S.
population,
including
exposures
below
those
in
the
range
of
empirical
observation?

As
mentioned
above,
the
reliance
on
linearity
assumptions
does
not
seem
appropriate.
However,
given
the
limitations
in
terms
of
data
availability
and
simplicity
of
presentations
of
results
and
combining
analyses,
a
good
argument
can
be
made
in
favor
of
linearity.
I
certainly
would
not
recommend
the
linear
model
in
general,
particularly
for
estimating
the
BMD
and
for
quantitative
risk
assessment.

 
Full
scale
IQ
was
not
measured
in
the
Faroe
Islands
study;
however,
three
IQ
subtests
were
conducted.
At
the
request
of
EPA,
the
Faroe
Islands
research
team
conducted
a
statistical
analysis
to
estimate
a
dose­
response
relationship.
Is
the
rationale
for
extrapolating
full
scale
IQ
from
the
three
subtests
clearly
explained
and
justified?
Is
the
approach
to
estimating
a
full­
scale
IQ
dose­
response
relationship
for
the
Faroe
Islands
appropriate?

(
Reviewer
did
not
respond
to
this
question)

 
Integrated
dose­
response
coefficients
are
estimated
first
using
only
the
IQ
doseresponse
coefficient
from
the
three
studies,
then
also
incorporating
data
for
other
19
neurodevelopmental
endpoints
(
e.
g.,
Boston
Naming
Test)
in
a
Bayesian
hierarchical
model.
Is
the
rationale
for
each
approach
clearly
explained?

(
Reviewer
did
not
respond
to
this
question)

 
Only
certain
non­
IQ
neurodevelopmental
endpoints
reported
in
the
Faroes,
New
Zealand
and
Seychelles
studies
could
be
included
in
the
Bayesian
model.
Is
the
rationale
for
selection
of
endpoints
for
inclusion/
exclusion
in
the
Bayesian
model
clear
and
reasonable?
Do
you
agree
with
the
rationale?

(
Reviewer
did
not
respond
to
this
question)

 
In
order
to
combine
data
from
different
studies
and
different
neurodevelomental
endpoints,
it
was
necessary
to
rescale
the
reported
dose­
response
coefficients.
Data
from
the
Faroes
is
converted
from
terms
of
cord
blood
mercury
to
hair
mercury.
All
endpoints
other
than
IQ
are
converted
to
the
IQ
scale.
Is
the
rescaling
of
the
coefficients
clearly
explained
and
appropriately
executed?

(
Reviewer
did
not
respond
to
this
question)

 
Please
comment
on
the
implementation
of
the
Bayesian
model
and
interpretation
of
the
results
of
this
model.
Is
this
portion
of
the
study
clearly
explained,
and
appropriately
executed
and
interpreted?
Has
all
relevant
information
been
considered
in
the
model?

(
Reviewer
did
not
respond
to
this
question)

Reviewer
3
Review
of
Effects
of
Prenatal
Methylmercury
on
Childhood
IQ:
A
Synthesis
of
Three
Studies;
Neurobehavioral
Assessments
Conducted
in
the
New
Zealand,
Faroe
Islands,
and
Seychelles
Islands
Studies
of
Methylmercury
Neurotoxicity
in
Children;
and
Adverse
mercury
effects
in
7­
year
old
children
expressed
as
loss
in
"
IQ"
Review
prepared
by
Dr.
Joseph
Jacobson
Professor
and
Chair,
Department
of
Psychology,
Wayne
State
University
February
22,
2005
The
three
papers
provided
for
this
review
are
very
well­
written.
Particularly
impressive
is
the
paper
by
Louise
Ryan,
which
lucidly
takes
the
reader
step­
by­
step
through
complex
statistical
analyses
that
are
not
likely
to
be
familiar
to
most
investigators
and
regulators.
The
meta­
analytic
technique
that
Dr.
Ryan
uses
is
state­
of­
the­
art.
20
The
principal
weakness
of
the
dose­
response
analysis
provided
by
these
papers
is
the
failure
of
the
Faroes
investigators
to
have
administered
an
IQ
test
to
the
children.
This
limitation
is
dealt
with
in
these
papers
by
using
the
three
IQ
subtests
that
were
administered
in
the
Faroes
 
Similarities,
Block
Design,
and
Digit
Span
 
to
provide
an
estimate
of
overall
IQ.
Bellinger
is
correct
that
Sattler
and
other
well­
respected
scholars
have
endorsed
using
2­
5
IQ
subtests
to
estimate
full­
scale
IQ,
particularly
when
the
purpose
of
the
analysis
is
to
compare
particular
groups
within
populations
rather
than
to
provide
clinical
evaluations
of
individual
patients.
However,
it
is
important
to
recognize
that
in
the
Faroes
study
the
Similarities
subtest
was
much
less
strongly
related
to
prenatal
methylmercury
(
MeHg)
exposure
than
two
other
verbal
tests,
Boston
Naming
and
California
Verbal
Learning.
The
MeHg
effects
detected
in
those
tests
would
probably
have
been
represented
more
adequately
by
the
WISC­
III
Vocabulary
subtest
than
by
the
Similarities
subtest
that
was
administered.
For
this
reason,
basing
the
IQ
estimate
on
the
three
subtests
that
were
administered
in
the
Faroes
probably
led
to
an
underestimate
the
impact
of
MeHg
exposure
on
IQ
and,
therefore,
an
underestimate
of
the
slope
of
the
dose­
response
curve.
Given
that
the
Seychelles
study
failed
to
find
a
relation
between
prenatal
MeHg
exposure
and
IQ,
the
evidence
for
an
adverse
effect
in
this
meta­
analysis
depends
primarily
on
the
data
from
the
Faroes
and
New
Zealand
studies.
If
the
IQ
analysis
understates
the
impact
of
MeHg
exposure
in
one
of
these
two
cohorts,
the
integrative
analysis
probably
understates
its
overall
impact
on
IQ.
Given
the
limitation
in
the
data
available
from
the
Faroes,
the
approach
taken
here
is
a
reasonable
one;
however,
the
authors
need
to
make
clear
the
likelihood
that
the
slope
of
the
dose­
response
function
is
underestimated.

RESPONSE:
We
agree
that
the
absence
of
full­
scale
IQ
testing
is
a
limitation
for
this
analysis.
We
believe
that
our
approach
of
using
all
of
the
available
data
from
the
WISC,
including
all
three
subtests
and
both
Similarities
examiners,
is
the
most
appropriate
approach
to
simulating
results
of
a
full­
scale
IQ
test.

We
do
not
agree
that
there
is
a
clear
basis
for
concluding
that
an
estimate
of
fullscale
IQ
derived
from
the
three
available
subtests
is
an
underestimate.
Regarding
the
suggestion
that
the
WISC­
R
Vocabulary
test
would
be
expected
to
capture
the
functions
represented
by
the
Faroes
results
for
the
Boston
Naming
Test
(
BNT)
and
the
California
Verbal
Learning
Test
(
CVLT­
C):
the
correlation
between
WISC­
R
Vocabulary
subtest
score
and
CVLT­
C
List
A
Trials
1­
5
Total
is
0.33
in
5­
8
year
olds.
So
the
WISC­
R
Vocabulary
subtest
probably
would
not
have
been
a
very
good
surrogate
for
the
effects
captured
in
the
Faroes
study
by
the
CVLT­
C
and,
likely,
Boston
Naming
Test.
The
slope
for
the
association
between
mercury
and
CVLT
or
BNT
is
steeper
than
the
slope
for
IQ,
but
it
is
conjecture
to
assert
that
the
slope
for
IQ
would
have
been
steeper
had
the
Vocabulary
subtest
replaced
the
Similarities
subtest,
or
had
the
full
WISC
battery
been
administered.
The
analysis
made
use
of
the
best
information
available
for
estimating
an
IQ
effect
in
the
Faroes,
and
there
is
no
basis
to
presume
any
bias
in
those
data.
21
We
have
added
sensitivity
analysis
that
looks
at
data
for
the
two
examiners
separately,
as
well
as
excluding
data
for
Similarities
entirely.
The
results
with
Similarities
excluded
are
very
close
to
those
based
on
using
all
three
subtests.

Bellinger
paper
I
agree
with
Bellinger
that
the
integration
of
data
from
the
WISC­
R
and
the
WISC­
III
should
not
be
a
problem
given
how
similar
the
two
tests
are
in
terms
of
the
constructs
measured
and
the
distributions
of
the
test
scores.
The
5­
point
difference
in
the
standardization
norms
should
not
be
a
problem
since
the
integrative
analysis
is
based
on
within­
cohort
correlations.
I
also
agree
with
Dr.
Bellinger
that,
if
the
effects
of
MeHg
are
highly
focal
 
affecting
only
specific
cognitive
functions
 
IQ
might
underestimate
the
MeHg
effects.
However,
although
the
Faroes
study
hypothesized
and
tested
for
focal
deficits,
it
actually
found
deficits
on
a
broad
range
of
cognitive
endpoints.
Moreover,
the
magnitude
of
the
deficits
in
these
focal
endpoints
did
not
exceed
the
magnitude
of
the
effects
found
on
more
global
IQ
measures
administered
in
New
Zealand.
It,
therefore,
seems
unlikely
that
the
results
of
a
full­
scale
IQ
test
would
have
substantially
underestimated
the
impact
of
the
MeHg
dose­
response.

One
issue
that
Bellinger
notes,
which
is
not
addressed
in
the
meta­
analysis
presented
by
Ryan,
is
the
degree
to
which
a
relatively
small
effect
at
the
mean
of
the
distribution
may
translate
into
a
clinically
meaningful
deficit
at
the
tails
of
the
doseresponse
relationship.
Bellinger
cites
data
by
Rose
and
Day
showing
that
the
correlation
is
often
very
high
between
the
mean
value
of
a
health
indicator
within
a
population
and
the
percentage
of
members
in
the
population
who
meet
criteria
for
disease,
i.
e.,
those
at
one
tail
of
the
distribution.
He
also
refers
to
data
from
a
randomized
control
trial
that
confirm
this
same
principle
but
needs
to
provide
a
reference
for
the
latter
finding.
The
principal
focus
in
the
meta­
analysis
presented
here
is
on
the
slope
of
the
dose­
response
curve,
which
is
clearly
the
important
initial
issue
that
needs
to
be
addressed.
However,
additional
analyses
focusing
on
the
tails
of
these
distributions
are
also
important
for
understanding
the
potential
impact
of
this
exposure
on
those
individuals
who
are
most
severely
affected.
Another
important
issue
cited
by
Bellinger
that
needs
to
receive
more
attention
is
the
potential
impact
of
this
exposure
on
particularly
sensitive
individuals
within
the
population.
One
problem
is
that
one
needs
to
identify
the
factors
that
render
particular
individuals
more
sensitive
to
MeHg
exposure,
a
line
of
research
that
is
critical
to
providing
a
realistic
assessment
of
the
true
cost
of
this
exposure
within
a
given
population.

RESPONSE:
We
agree
that
additional
analysis
and
research
on
the
tails
of
the
distribution
and
factors
related
to
individual
sensitivity
would
be
informative.
Further
analysis
of
the
tails
of
the
distribution
would
require
access
to
the
raw
data
from
the
three
studies.
The
reference
for
the
randomized
control
trial
has
been
added
to
the
Bellinger
report:
22
Laaser
U,
Breckenkamp
J,
Ullrich
A,
Hoffmann
B.
Can
a
decline
in
the
population
means
of
cardiovascular
risk
factors
reduce
the
number
of
people
at
risk?
Journal
of
Epidemiology
and
Community
Health
2001;
55:
179­
184.

Budtz­
Jorgensen
Budtz­
Jorgensen
has
done
an
excellent
job
in
providing
the
reanalyses
of
the
Faroes
data
that
made
Ryan's
integrative
analysis
possible.
The
initial
decision
of
the
Faroes
investigators
to
use
log
transformation
of
prenatal
MeHg
exposure
to
reduce
the
impact
of
a
few
highly
exposed
outliers
was
sound.
The
National
Research
Council
(
2000)
report
expressed
a
concern
that
the
log
transformation
distorted
the
Faroes
doseresponse
curve
at
the
lower
end
of
the
distribution,
where
the
number
of
cases
was
relatively
sparse.
In
response
to
this
concern,
Dr.
Budtz­
Jorgensen
reanalyzed
the
Faroes
data
using
two
different
approaches
 
robust
regression
and
deleting
all
cases
>
10
ppm
 
both
of
which
provide
reasonable
alternative
approaches
for
reducing
the
impact
of
the
outliers.
Of
the
two
approaches,
the
robust
regression
is
preferable
since
it
makes
use
of
the
information
from
the
full
range
of
the
exposure
distribution,
reducing
the
impact
of
the
highest
exposed
cases
without
discarding
them
altogether.

The
structural
equation
modeling
(
SEM)
approach
used
by
Budtz­
Jorgenson
for
evaluating
the
dose­
response
relation
between
MeHg
exposure
and
estimated
IQ
is
also
very
sound
in
that
it
permits
each
of
the
IQ
subtests
to
be
weighted
optimally
in
relation
to
exposure.
One
additional
limitation
of
the
Faroes
study
explained
by
Dr.
Budtz­
Jorgensen
in
this
paper
is
that
the
data
on
Similarities
collected
during
Year
2
of
the
study,
when
a
relatively
inexperienced
examiner
administered
the
test,
are
of
questionable
validity.
Thus,
MeHg
predicted
reduced
Similarities
scores
during
Year
1,
when
the
test
was
administered
by
a
more
experienced
neuropychologist,
but
not
during
Year
2.
Budtz­
Jorgensen
handles
this
problem
by
controlling
statistically
for
examiner
in
the
SEM,
which
is
a
standard
and
acceptable
approach.
The
alternative
would
be
to
perform
the
SEM
analysis
only
on
the
Year
1
children,
which
is
how
Grandjean
et
al.
(
1997)
handled
the
problem
that
the
continuous
performance
test
(
CPT)
data
collected
during
Year
2
did
not
appear
to
be
valid.
The
advantage
of
the
approach
taken
with
the
CPT
data
is
that
the
Year
2
data,
which
are
probably
not
valid,
are
omitted
from
the
analysis.
The
disadvantage
is
that
the
sample
size
is
smaller.
It
would
be
of
interest
to
see
the
analyses
performed
both
ways
to
determine
whether
the
dose­
response
coefficient
is
stronger
when
only
the
presumably
more
valid
Year
1
data
are
considered.
It
should
be
noted
that
Figure
2
in
the
Ryan
paper
does
not
show
that
examiner
was
included
in
the
analysis
as
a
covariate
of
Similarities;
the
figure
should
be
modified
to
show
this
important
covariate.

RESPONSE:
We
agree
that
the
difference
in
results
between
the
two
Similarities
examiners
is
important
to
consider
and
have
added
sensitivity
analysis
that
looks
23
at
data
for
the
two
examiners
separately,
as
well
as
excluding
data
for
Similarities
entirely.
The
results
with
Similarities
excluded
are
very
close
to
those
based
on
using
all
three
subtests.

However,
we
do
not
believe
it
is
warranted
to
disregard
the
Similarities
data
for
Examiner
B.
The
only
basis
for
concluding
that
they
are
invalid
is
that
the
scores
were
not
related
to
mercury
in
the
same
way
as
the
scores
collected
by
Examiner
A.
We
do
not
have
the
information
that
would
be
needed
to
conclude
that
Examiner
B
data
are
invalid.
The
Faroes
investigators
have
not
concluded
that
the
Examiner
B
data
are
invalid.

My
only
reservation
regarding
the
Budtz­
Jorgenson
paper
relates
to
statements
made
at
the
end
of
the
middle
paragraph
on
p.
3,
where
in
my
view
the
degree
of
confounding
with
PCB
exposure
is
understated.
The
issue
of
confounding
with
PCBs
in
relation
to
certain
of
the
most
important
verbal
performance
endpoints
is
a
difficult
one.
Dr.
Budtz­
Jorgenson
has
published
some
analyses
that
provide
some
reassurance
that
the
developmental
deficits
found
to
be
associated
with
MeHg
are
probably
not
attributable
to
PCB
exposure.
However,
in
my
view,
it
is
somewhat
of
an
overstatement
to
characterize
the
confounding
as
"
limited."

RESPONSE:
We
believe
the
characterization
is
consistent
with
the
published
findings
and
the
interpretation
offered
by
the
NAS
methylmercury
panel.

Ryan
As
noted
above,
Dr.
Ryan
has
done
an
impressive
job
in
integrating
the
data
from
the
Faroes,
Seychelles,
and
New
Zealand
studies
and
has
provided
an
exceptionally
clear
explanation
of
these
meta­
analyses.
She
does
an
excellent
job
of
explaining
and
justifying
critical
choices
at
each
step
in
the
analysis;
for
example,
the
decisions
to
exclude
the
extreme
outlier
from
the
primary
New
Zealand
analyses
and
to
focus
on
the
whole
Faroes
sample.
The
decision
to
base
the
scaling
factor
for
the
Faroes
IQ
measure
on
the
standard
deviation
of
the
latent
variable
rather
than
on
the
scaling
factor
for
Digit
Span
is
also
well
justified.
The
use
of
sensitivity
analyses,
such
as
the
simultaneous
examination
of
the
200
vs.
250
conversion
factors
for
cord
blood
mercury,
is
also
impressive
and
adds
to
the
reader's
confidence
in
the
results
of
the
meta­
analysis.

On
p.
12,
lines
14­
16,
it
would
be
helpful
if
the
author
would
explain
the
concern
about
the
relatively
few
observations
at
the
lower
end
of
the
exposure
distribution
in
the
Faroes
cohort,
particularly
the
suggestion
that
the
dearth
of
observations
in
this
range
may
limit
generalizeability
to
the
U.
S.
population.
Table
3
is
confusing.
Why
are
the
24
Seychelles
data
listed
in
the
"
subset"
column?
Weren't
all
the
available
data
included?
I
was
also
confused
by
Table
10.
The
note
to
Table
10
states
that
lower
values
of
DIC
indicate
a
better
model
fit,
but
the
text
at
the
top
of
p.
22
states
that
the
optimal
value
of
R
appears
to
be
in
the
range
of
1
to
1.5,
where
the
DIC
values
are
highest.

RESPONSE:
Additional
discussion
added
on
page
12,
as
suggested.
Seychelles
is
included
in
the
"
subset"
column
because
these
investigators
only
report
regression
analyses
for
a
subset
of
data
obtained
by
excluding
some
extreme
observations
found
to
be
highly
influential
(
see
p
1689
of
Myers
2003).
DIC
values
are
indeed
lowest
for
values
of
R
in
the
range
1
to
1.5,
since
they
are
negative.

The
data
in
Figure
5
warrant
some
discussion
in
the
text.
It
should
be
noted
that
in
this
analysis,
the
Faroes
estimated
IQ
data
overlap
completely
with
the
Seychelles
data.
Since
the
Seychelles
study
found
no
evidence
of
an
adverse
effect
of
prenatal
MeHg
on
developmental
outcome,
by
implication
the
Faroes
estimated
IQ
data
also
indicate
virtually
no
effect.
Given
that
other
endpoints
in
the
Faroes
study
indicate
significant
adverse
effects,
the
data
in
Fig.
5
confirm
my
concern
noted
above
that,
because
IQ
was
estimated
based
on
three
IQ
subtests
that
were
only
weakly
related
to
MeHg
exposure
in
this
cohort,
the
estimated
IQ
score
does
not
adequately
reflect
the
adverse
effect
found
on
other
endpoints.
Thus,
the
dose­
response
coefficient
that
emerges
from
the
integrated
cross­
study
analysis
is
likely
to
be
underestimated
since
the
MeHg
impact
on
the
Faroes
cohort
is
understated.
The
principal
problem,
which
is
presented
clearly
by
Bellinger,
is
that
an
IQ
estimate
is
needed
to
evaluate
the
societal
costs
of
this
exposure.
The
Boston
Naming
Test
and
the
California
Verbal
Learning
Test,
two
endpoints
that
were
particularly
sensitive
to
MeHg
exposure
in
the
Faroes
cohort,
are
probably
fairly
strongly
correlated
with
the
WISC­
III
Vocabulary
subtest
and/
or
Verbal
IQ
score.
(
The
data
in
Table
7
illustrate
how
much
more
strongly
MeHg
was
related
to
Boston
Naming
than
to
the
estimated
IQ
derived
from
the
SEM.)
Instead
of
relying
on
the
Similarities
subtest,
which
was
only
very
weakly
related
to
MeHg
and
apparently
not
validly
assessed
in
half
the
sample,
it
might
be
better
to
estimate
Vocabulary
or
Verbal
IQ
based
on
the
endpoints
that
were
most
sensitive
to
MeHg
and
to
include
that
estimate
in
addition
to
or
instead
of
Similarities
in
the
SEM.

RESPONSE:
We
respectfully
disagree
that
the
estimated
effect
of
methyl
mercury
on
IQ
is
underestimated
for
the
Faroes
study.
While
it
is
true
that
we
did
not
have
access
to
full
scale
IQ,
the
SEM
analysis
does
in
fact
adjust
for
some
of
the
uncertainty
associated
with
the
poor
measurement.
We
agree
that
there
is
likely
to
be
variability
among
the
WISC
subtests
in
sensitivity
to
the
effects
of
mercury.
Perhaps
Similarities
is
less
sensitive
than
other
subtests
that
were
not
administered
to
the
Faroes
cohort,
but
there
is
no
way
of
knowing
this.
The
objective
of
this
analysis
is
to
produce
a
best
estimate
of
the
IQ
dose­
response
relationship
with
mercury.
We
do
not
believe
there
is
a
defensible
alternative
to
using
all
information
from
the
WISC
subtests
that
were
administered
in
the
Faroes
25
to
develop
this
best
estimate.
It
would
not
be
justified
to
exclude
certain
tests
because
they
displayed
a
weaker
response
to
mercury
in
the
Faroes
study.

We
do
not
agree
with
the
inference
drawn
by
the
commenter
based
on
comparison
of
the
Faroes
and
Seychelles
dose­
response
coefficients.
The
Seychelles
analysis
(
Myers
et
al.
2003)
found
an
IQ/
methyl
mercury
response
of
­
0.13
IQ
points
per
ppm
hair
mercury
(
SE
=
0.10).
The
response
found
in
the
Seychelles
analysis
is
elevated,
though
not
statistically
significant
(
p
value
from
the
study
is
0.20).
One
interpretation
is
that
there
is
a
response
between
IQ
and
methylmercury,
but
the
data
are
insufficient
to
sufficiently
resolve
the
response.
It
is
not
uncommon
in
environmental
epidemiology
to
have
a
number
of
factors
impinge
of
the
ability
to
fully
detect
a
relationship
(
e.
g.
issues
such
as
measurement
error
in
the
exposure,
statistical
power,
etc).
It
is
premature
to
implicate
the
Faroes
results
based
on
the
Seychelles
response.
In
addition,
the
NRC
has
evaluated
the
neurological
responses
from
the
three
studies
and
noted
why
there
may
be
differences
in
the
results
among
them,
some
of
which
could
influence
the
findings
here.

I
generally
disagree
with
the
characterization
of
the
results
of
these
analyses
presented
in
pp.
26­
27.
To
characterize
the
dose­
response
pattern
across
these
three
studies
as
"
fairly
consistent"
strikes
me
as
an
overstatement.
Similarly,
the
statement
that
the
estimates
are
"
somewhat"
sensitive
to
the
assumed
values
of
the
variance
components,
how
the
coefficient
from
the
SEM
analysis
is
scaled,
and
whether
the
outlier
is
included
in
the
New
Zealand
analysis
is,
in
my
opinion,
an
understatement.
The
statement
on
p.
27
that
it
is
"
unfortunate"
that
IQ
was
not
measured
directly
in
the
Faroes
study
is
also
an
understatement.
In
my
view,
Dr.
Ryan
needs
to
acknowledge
clearly
that
the
failure
to
measure
IQ
directly
in
the
Faroes
may
well
have
led
to
an
underestimate
of
the
magnitude
of
the
dose­
response
coefficient.

RESPONSE:
Comments
regarding
the
qualitative
characterizations
will
be
considered
as
final
versions
of
the
reports
are
prepared.
The
issue
regarding
possible
IQ
underestimation
is
addressed
above.

General
Topics
 
Please
comment
on
the
robustness
of
the
methods,
models
and
data
presented
and
used
in
this
research.

The
methods,
models,
and
data
used
in
this
research
are
generally
very
robust,
with
the
important
exception
of
the
data
used
to
estimate
IQ
for
the
Faroes
cohort.
The
problem
with
these
data
derives
from
the
failure
of
the
Faroes
investigators
to
have
administered
26
an
IQ
test
to
the
children.
This
limitation
is
dealt
with
in
these
papers
by
using
the
three
IQ
subtests
that
were
administered
in
the
Faroes
 
Similarities,
Block
Design,
and
Digit
Span
 
to
provide
an
estimate
of
overall
IQ.
Bellinger
is
correct
that
Sattler
and
other
well­
respected
scholars
have
endorsed
using
2­
5
IQ
subtests
to
estimate
full­
scale
IQ,
particularly
when
the
purpose
of
the
analysis
is
to
compare
particular
groups
within
populations
rather
than
to
provide
clinical
evaluations
of
individual
patients.
However,
it
is
important
to
recognize
that
in
the
Faroes
study
the
Similarities
subtest
was
much
less
strongly
related
to
prenatal
methylmercury
(
MeHg)
exposure
than
two
other
verbal
tests,
Boston
Naming
and
California
Verbal
Learning.
The
MeHg
effects
detected
in
those
tests
would
probably
have
been
represented
more
adequately
by
the
WISC­
III
Vocabulary
subtest
than
by
Similarities,
the
verbal
subtest
that
was
administered.
For
this
reason,
basing
the
IQ
estimate
on
the
three
subtests
that
were
administered
in
the
Faroes
may
well
have
led
to
a
significant
underestimate
of
the
impact
of
MeHg
exposure
on
IQ
and,
therefore,
an
underestimate
of
the
slope
of
the
dose­
response
curve
derived
from
the
meta­
analysis.

RESPONSE:
See
responses
above
regarding
the
three
subtests
and
potential
underestimation
of
IQ
for
the
Faroes.

 
Does
the
analysis
incorporate
all
relevant
studies?

Yes.

 
Given
the
scope
and
intended
purpose
of
the
methodology,
are
the
analytical
framework,
assumptions,
and
application
of
data
appropriate?
Are
the
scientific
uncertainties
clearly
identified
and
characterized
throughout
the
analysis?

The
analytic
framework,
assumptions,
and
application
of
the
data
are
appropriate
and,
in
most
respects,
the
authors
do
an
excellent
job
in
identifying
and
clarifying
the
principal
sources
of
uncertainty.
However,
the
degree
to
which
reliance
on
the
three
IQ
subtests
administered
in
the
Faroes
is
likely
to
have
led
to
an
underestimate
of
the
dose­
response
coefficient
is
not
adequately
recognized.
On
pp.
26­
27
of
Ryan's
paper,
it
is,
in
my
view,
an
overstatement
to
characterize
the
dose­
response
pattern
across
these
three
studies
as
"
fairly
consistent"
and
probably
an
understatement
that
the
estimates
are
"
somewhat"
sensitive
to
the
assumed
values
of
the
variance
components,
how
the
coefficient
from
the
SEM
analysis
is
scaled,
and
whether
the
outlier
is
included
in
the
New
Zealand
analysis.
In
addition,
Dr.
Budtz­
Jorgenson's
statements
at
the
end
of
the
middle
paragraph
on
p.
3
of
his
paper,
in
my
view,
understate
the
degree
of
confounding
with
PCB
exposure.
Although
he
has
published
some
analyses
that
provide
some
reassurance
that
the
developmental
deficits
found
to
be
associated
with
MeHg
are
probably
not
attributable
to
PCB
exposure,
it
is
an
overstatement
to
characterize
the
confounding
as
"
limited."

RESPONSE:
See
responses
above.
27
 
Are
the
methods
applied
in
this
study
appropriate
for
quantifying
IQ
decrements
in
the
range
of
current
U.
S.
mercury
exposures?
If
not,
what
methods
would
you
recommend?

Since
the
range
of
exposures
in
these
cohorts
overlaps
with
that
found
in
the
U.
S.,
the
methods
are
appropriate
for
quantifying
IQ
deficits
in
the
range
of
current
U.
S.
mercury
exposures.
However,
it
would
be
helpful
if
the
authors
would
explain
more
clearly
the
concern
raised
by
the
NRC
panel
about
the
relatively
few
observations
at
the
lower
end
of
the
exposure
distribution
in
the
Faroes
cohort,
particularly
the
suggestion
that
the
dearth
of
observations
in
this
range
may
limit
generalizeability
to
the
U.
S.
population.

RESPONSE:
The
relatively
small
number
of
observations
at
the
lower
end
of
the
exposure
distribution
is
not
at
all
unusual
for
epidemiological
studies
of
environmental
contaminants.
The
shape
of
the
dose­
response
curve
was
evaluated
by
the
NRC,
and
they
found
that
the
linear
model
provided
the
best
fit
(
after
excluding
supralinear
models
from
consideration).
Lacking
any
information
to
the
contrary,
a
linear
model
down
to
the
lowest
doses
is
appropriate
as
is
used
in
other
similar
cases.
There
is
more
uncertainty
at
lower
exposure
levels,
but
the
available
data
do
not
provide
a
basis
for
any
other
approach.

 
What
are
the
overall
major
strengths
and
weaknesses
of
this
analysis?

The
major
strengths
of
this
analysis
include
that
it
is
robust,
state­
of­
the­
art,
and
provides
an
important
opportunity
to
evaluate
the
societal
cost
of
this
exposure
based
on
IQ
data.
Dr.
Ryan
does
an
excellent
job
of
explaining
and
justifying
critical
choices
at
each
step
in
the
analysis;
for
example,
the
decisions
to
exclude
the
extreme
outlier
from
the
primary
New
Zealand
analyses
and
to
focus
on
the
whole
Faroes
sample.
The
decision
to
base
the
scaling
factor
for
the
Faroes
IQ
measure
on
the
standard
deviation
of
the
latent
variable
rather
than
on
the
scaling
factor
for
Digit
Span
is
also
well
justified.
The
use
of
sensitivity
analyses,
such
as
the
simultaneous
examination
of
the
200
vs.
250
conversion
factors
for
cord
blood
mercury,
is
impressive
and
adds
to
the
reader's
confidence
in
the
results
of
the
meta­
analysis.
The
principal
weakness,
as
noted
above,
is
that
the
approach
used
to
estimate
IQ
in
the
the
Faroes
study
probably
led
to
an
underestimate
of
the
impact
of
this
exposure
on
IQ
and,
therefore,
the
magnitude
of
the
dose­
response
coefficient
derived
from
it.

 
Are
all
of
the
essential
elements
included
in
the
final
report?
Is
the
report
clear
and
well­
written?
What
additional
documentation,
if
any,
do
you
feel
is
needed
to
ensure
transparency?

These
papers
do
an
impressive
job
in
integrating
the
data
from
the
Faroes,
Seychelles,
and
New
Zealand
studies,
and
Dr.
Ryan
has
provided
an
exceptionally
clear
explanation
of
these
meta­
analyses.
She
does
an
excellent
job
of
explaining
and
justifying
critical
choices
at
each
step
in
the
analysis.
One
additional
analysis
that
I
would
like
to
see
would
be
a
Faroes
IQ
estimate
based
only
on
the
children
who
were
assessed
on
the
IQ
28
Similarities
subtest
during
Year
1.
As
Budtz­
Jorgensen
explains,
MeHg
predicted
reduced
Similarities
scores
during
Year
1,
when
the
test
was
administered
by
an
experienced
neuropychologist,
but
not
during
Year
2
when
a
relatively
inexperienced
examiner
administered
the
test.
Budtz­
Jorgensen
handles
this
problem
by
controlling
statistically
for
examiner
in
the
SEM,
which
is
a
standard
and
acceptable
approach.
The
alternative
would
be
to
perform
the
SEM
analysis
only
on
the
Year
1
children,
which
is
how
Grandjean
et
al.
(
1997)
handled
the
problem
that
the
continuous
performance
test
(
CPT)
data
collected
during
Year
2
also
did
not
appear
to
be
valid.
The
advantage
of
the
approach
taken
with
the
CPT
data
is
that
the
Year
2
data,
which
are
probably
not
valid,
are
omitted
from
the
analysis.
The
disadvantage
is
that
the
sample
size
is
smaller.
The
Faroes
IQ
estimate
might
be
more
valid
if
it
is
performed
only
on
those
children
for
whom
a
valid
Similarities
test
was
obtained.

RESPONSE:
We
have
added
a
sensitivity
analysis
of
the
impact
of
the
different
results
for
the
two
Similarities
examiners.

Specific
Topics
 
This
analysis
focuses
on
IQ
as
the
neurodevelopmental
outcome
for
mercury
economic
benefits
analysis.
Are
there
other
neurodevelopmental
endpoints
for
mercury
that
could
be
quantified?
Are
there
advantages
and
disadvantages
of
using
IQ
to
represent
neurodevelopmental
effects
of
prenatal
mercury
exposure
in
addition
to
those
identified?

As
noted
above,
the
principal
advantage
of
using
an
IQ
and/
or
estimated
IQ
measure
is
that
it
will
permit
evaluation
of
the
societal
costs
of
this
exposure.
There
are
no
alternative
neurodevelopmental
measures
that
will
provide
this
information.
However,
other
approaches
to
estimating
IQ,
based
on
performance
on
Boston
Naming
and/
or
California
Verbal
Learning,
should
probably
be
considered.

RESPONSE:
We
are
not
aware
of
any
methods
or
precedents
for
using
BNT
or
CVLT­
C
in
the
estimation
of
WISC
IQ.
We
do
not
believe
that
a
stronger
effect
of
mercury
is
a
justifiable
basis
for
selecting
tests
to
use
in
estimating
the
effect
of
mercury
on
IQ.
The
use
of
BNT
and
CVLT­
C,
as
well
as
other
non­
IQ
tests,
in
estimating
the
benefits
of
reduced
mercury
exposure
should
be
explored
and
considered
for
future
analyses
of
the
benefits
of
mercury
exposure
reduction.
Discussion
has
been
added
to
the
Bellinger
report
on
the
underestimation
of
benefits
when
relying
solely
on
IQ
and
excluding
other
tests/
effects.

 
The
approach
to
quantifying
the
IQ
dose­
response
relationship
integrates
data
from
studies
conducted
in
the
Faroes,
Seychelles
and
New
Zealand.
Is
it
appropriate
to
29
combine
results
from
these
three
studies
for
this
analysis?
Do
differences
in
the
version
of
the
IQ
test
(
Wechsler
Intelligence
Scales
for
Children,
or
WISC)
administered
in
the
three
studies
raise
any
issues
that
are
of
concern
to
you?

Combining
the
data
from
these
two
studies
using
the
approach
taken
in
these
papers
in
valid.
The
content
of
the
WISC­
R
and
the
WISC­
III
and
the
distributions
of
the
scores
are
sufficiently
similar
to
warrant
combining
results
from
them
in
the
same
analysis.
The
5­
point
difference
in
the
standardization
norms
should
not
be
a
problem
since
the
integrative
analysis
is
based
on
within­
cohort
correlations.

 
The
authors
of
this
analysis
had
to
select
dose­
response
coefficients
from
the
three
studies
for
use
in
their
statistical
modeling.
Please
comment
on
the
choice
of
doseresponse
coefficients
from
the
three
studies
for
use
in
this
analysis.
Are
there
alternate
coefficients
or
other
data
from
these
three
studies
that
should
be
used
in
place
or,
or
in
addition
to,
those
used
in
this
analysis?

The
choices
seem
appropriate.

 
This
analysis
generally
relies
on
coefficients
from
linear
dose­
response
models.
Given
the
available
information,
is
this
an
appropriate
approach,
and
applicable
to
the
full
range
of
exposures
experienced
by
the
U.
S.
population,
including
exposures
below
those
in
the
range
of
empirical
observation?

The
linear
models
seem
appropriate;
however,
as
noted
above,
the
authors
should
acknowledge
that
the
relative
paucity
of
cases
at
the
lower
end
of
the
distribution
adds
some
degree
of
uncertainty
to
the
findings.

RESPONSE:
See
response
above.
Discussion
has
been
added
as
suggested.

 
Full
scale
IQ
was
not
measured
in
the
Faroe
Islands
study;
however,
three
IQ
subtests
were
conducted.
At
the
request
of
EPA,
the
Faroe
Islands
research
team
conducted
a
statistical
analysis
to
estimate
a
dose­
response
relationship.
Is
the
rationale
for
extrapolating
full
scale
IQ
from
the
three
subtests
clearly
explained
and
justified?
Is
the
approach
to
estimating
a
full­
scale
IQ
dose­
response
relationship
for
the
Faroe
Islands
appropriate?

Please
see
my
comments
above
detailing
my
reservations
regarding
the
approach
used
to
estimate
IQ
scores
for
the
Faroes
cohort.
Using
the
approach
taken
in
the
analysis,
the
data
in
Figure
5
of
the
Ryan
paper
show
that
the
Faroes
estimated
IQ
data
overlap
completely
with
the
Seychelles
data.
Since
the
Seychelles
study
found
no
evidence
of
an
adverse
effect
of
prenatal
MeHg
on
developmental
outcome,
by
implication
the
Faroes
estimated
IQ
data
also
indicate
virtually
no
effect.
Given
that
other
endpoints
in
the
Faroes
study
indicate
significant
adverse
effects,
the
data
in
Fig.
5
confirm
my
concern
that,
because
IQ
was
estimated
based
on
three
IQ
subtests
that
were
only
weakly
related
to
MeHg
exposure
in
this
cohort,
the
estimated
IQ
score
does
not
adequately
reflect
the
30
adverse
effect
found
on
other
endpoints.
Thus,
the
dose­
response
coefficient
that
emerges
from
the
integrated
cross­
study
analysis
is
likely
to
be
underestimated
since
the
MeHg
impact
on
the
Faroes
cohort
is
understated.

RESPONSE:
See
above
responses.

 
Integrated
dose­
response
coefficients
are
estimated
first
using
only
the
IQ
doseresponse
coefficient
from
the
three
studies,
then
also
incorporating
data
for
other
neurodevelopmental
endpoints
(
e.
g.,
Boston
Naming
Test)
in
a
Bayesian
hierarchical
model.
Is
the
rationale
for
each
approach
clearly
explained?

Yes,
but
it
is
not
clear
that
the
data
generated
by
the
Bayesian
model
can
be
used
directly
in
estimating
the
IQ
scores.

 
Only
certain
non­
IQ
neurodevelopmental
endpoints
reported
in
the
Faroes,
New
Zealand
and
Seychelles
studies
could
be
included
in
the
Bayesian
model.
Is
the
rationale
for
selection
of
endpoints
for
inclusion/
exclusion
in
the
Bayesian
model
clear
and
reasonable?
Do
you
agree
with
the
rationale?

The
rationale
for
inclusion/
exclusion
is
sound.

 
In
order
to
combine
data
from
different
studies
and
different
neurodevelomental
endpoints,
it
was
necessary
to
rescale
the
reported
dose­
response
coefficients.
Data
from
the
Faroes
is
converted
from
terms
of
cord
blood
mercury
to
hair
mercury.
All
endpoints
other
than
IQ
are
converted
to
the
IQ
scale.
Is
the
rescaling
of
the
coefficients
clearly
explained
and
appropriately
executed?

Yes,
and
the
sensitivity
analysis
comparing
the
two
conversion
factors
is
particularly
impressive.

 
Please
comment
on
the
implementation
of
the
Bayesian
model
and
interpretation
of
the
results
of
this
model.
Is
this
portion
of
the
study
clearly
explained,
and
appropriately
executed
and
interpreted?
Has
all
relevant
information
been
considered
in
the
model?

The
model
is
clearly
explained,
but
its
relevance
to
estimation
of
IQ
is
not
clear.

RESPONSE:
The
Bayesian
model
produces
an
estimate
of
the
IQ
dose­
response
regression
coefficient,
comparable
in
interpretation
to
those
reported
for
New
Zealand
in
Crump
et
al.
1998
and
Myers
et
al.
2003,
that
takes
into
account
information
from
all
three
studies.
This
estimate
is
directly
applicable
to
the
estimation
of
decrements
in
IQ
associated
with
different
levels
of
prenatal
maternal
mercury
body
burdens.
31
Reviewer
4
Review
of
selected
materials
on
methylmercury
and
IQ
Prepared
for
Industrial
Economics,
Inc.
by
John
Bailar,
Professor
Emeritus,
University
of
Chicago
February
22,
2005
At
the
request
of
IEC
I
have
reviewed
materials
selected
by
them
relating
to
the
neurological
effects
of
prenatal
methylmercury
(
mehg)
exposure
on
IQ.
I
have
reviewed
three
reports
in
some
detail:

Louise
M.
Ryan,
Effects
of
Prenatal
Methylmercury
on
Childhood
IQ
David
C.
Bellinger,
Neurobehavioral
Assessments
conducted
in
the
New
Zealand,
Faroe
Islands,
and
Seychelles
Islands
Studies
of
Methylmercury
Neurotoxicity
in
Children
Esben
Budtz­
Jorgensen
et
al.,
Adverse
Mercury
Effects
in
7­
year­
old
Children
Expressed
as
Loss
in
"
IQ"

The
paper
by
Ryan
is
the
main
basis
of
this
report,
with
the
other
two
in
supporting
roles.
In
addition,
I
have
looked
at
selected
parts
of
7
other
research
reports
supplied
by
IEC,
which
are
the
bases
of
the
three
papers
listed
above.

I
have
not
reviewed
other
relevant
materials.
Also,
I
have
no
more
than
a
general
familiarity
with
three
areas
critical
to
the
analysis:
neurological
testing,
structural
equation
models,
and
the
biologic
effects
of
methylmercury.
Nevertheless,
I
believe
that
my
comments
below
are
correct
and
appropriate.
For
full
transparency,
I
am
Scholar
in
Residence
at
the
National
Academies,
but
I
had
no
role
in
the
NAS
report
on
methylmercury,
released
in
2000,
nor
have
I
read
any
part
of
that
report.

Reader
should
understand
that
qualified
statisticians
facing
a
complex
set
of
data,
such
as
here,
must
make
very
large
numbers
of
choices,
and
that
they
are
likely
to
make
their
choices
differently.
I
have
therefore
accepted
Ryan's
choices
and
not
even
commented
on
them
unless
I
have
been
asked
to
do
so,
or
I
think
that
our
different
approaches
would
lead
to
different
findings
and
conclusions.

Summary:

Ryan
has
done
a
very
solid
job
of
presenting
the
data,
presenting
a
new
statistical
analysis,
and
discussing
the
strengths
and
limitations
of
both
the
data
and
the
methods
she
uses.
However,
I
would
have
done
two
things
in
substantially
different
ways.
First,
32
Ryan
is
careful
in
discussing
the
effects
of
sampling
variation
("
random
error")
on
confidence
bounds
and
p­
values,
but
I
regret
the
absence
of
any
serious
discussion
of
the
probable
or
even
possible
size
of
the
non­
random
uncertainties
surrounding
her
analysis.
I
believe
that
they
are
considerable,
which
means
that
the
p­
values
may
be
substantially
too
low
and
the
confidence
bounds
substantially
too
narrow.
Second,
I
would
not
have
gone
as
far
in
estimating
effects
from
combinations
of
data
that
are,
or
might
be,
fundamentally
incompatible.

RESPONSE:
Though
we
have
no
access
to
the
raw
data,
the
coefficients
used
in
our
analyses
are
all
from
carefully­
controlled
analysis
that
adjusted
for
many
covariates.
Nevertheless,
we
agree
that
there
may
be
sources
of
non­
sampling
error
present
in
these
data.
The
Bayesian
hierarchical
model
allows
for
this
by
explicitly
incorporating
study­
to­
study
and
endpoint­
to­
endpoint
variance
components.
Random
effects
are
often
used
to
help
characterize
important
sources
of
error
that
cannot
be
captured
by
measured
covariates.
That
being
said,
there
are
important
caveats
and
limitations
to
our
analysis.
The
discussion
of
these
in
the
final
section
of
the
Ryan
report
has
been
expanded.

These
caveats
do
not
detract
from
the
basically
high
quality
of
Ryan's
report.
I
largely
concur
in
her
overall
conclusion
that,
"
prenatal
exposure
to
mehg
results
in
a
significant
decrease
in
full
scale
IQ
with
a
central
estimate
in
the
approximate
range
of
­.
1
to
­
0.25
IQ
points
for
every
1
ug/
increase
in
maternal
mehg
".
My
remaining
uncertainties
about
Ryan's
conclusion
have
to
do
mainly
with
the
possibility
of
post­
natal
exposures
and
other
confounders,
the
narrowness
of
the
confidence
bounds
around
the
estimated
coefficients,
and
the
assumption
that
a
linear
scale
is
appropriate.

Overall,
there
is
more
uncertainty
about
the
findings
than
is
evident
from
the
analyses
and
text
presented,
and
many
of
the
statistically
significant
results
have
confidence
bounds
that
almost
include
the
point
of
no
effect.
My
concern
is
increased
by
the
"
marked
degree
of
both
study­
to­
study
and
endpoint­
to­
endpoint
variability"
(
Ryan,
page
24)
seen
in,
for
example,
Figure
10.

General
questions
IEC
has
asked
that
my
review
include
specific
responses
to
a
list
of
questions,
classified
as
general
or
specific.
My
answers
here
about
the
general
questions
refer
only
to
the
Ryan
analysis.
I
have
a
few
brief
comments
on
the
other
analyses
later
in
this
review.

Please
comment
on
the
robustness
of
the
methods,
models,
and
data
presented
and
used
in
this
research.

The
methods,
models,
and
data
are
all
somewhat
weak
for
the
purposes
at
hand,
but
I
cannot
think
of
any
material
improvements.
Because
of
these
problems,
I
would
have
stopped
with
simple
analyses
of
the
results
from
each
test
in
each
33
location,
and
synthesized
the
results
in
a
more
subjective
way.
Ryan
recognizes
these
problems,
and
has
pushed
the
statistical
analysis
further
than
I
would,
though
she
may
be
more
in
the
mainstream
of
statistics
on
this
than
I
am.
What
she
has
done
is
fully
defensible.

Does
the
analysis
incorporate
all
relevant
studies?

I
am
not
an
expert
in
this
field,
but
the
studies
I
know
about
have
been
as
substantially
higher
levels
of
mercury
exposure.
I
know
of
no
relevant
studies
that
have
been
missed.

Given
the
scope
and
intended
purpose
of
the
methodology,
are
the
analytical
framework,
assumptions,
and
application
of
data
appropriate?
Are
the
scientific
uncertainties
clearly
identified
and
characterized
throughout
the
analysis?

Ryan's
report
passes
all
these
standards
without
difficulty,
with
one
exception.
There
is
insufficient
attention
to
the
probable
effects
of
non­
random
errors
(
bias
in
the
data).
These
necessarily
increase
levels
of
uncertainty
about
the
findings,
and
the
increases
may
be
quite
large.
In
particular,
given
the
interest
of
EPA
and
others
in
the
shape
of
the
dose­
response
relationship,
the
use
of
linear
models
seems
highly
restrictive.
This
is
probably
not
a
problem
with
Ryan's
analysis,
but
rather
a
reflection
of
the
serious
limitations
of
the
data,
but
the
net
result
is
that
linearity
cannot
be
examined,
and
as
noted
below,
lack
of
linearity
would
not
be
at
all
surprising.

RESPONSE:
The
discussion
section
of
the
Ryan
report
has
been
expanded
to
include
discussion
of
how
the
Bayesian
hierarchical
model
can
be
helpful
in
terms
of
accommodating
systematic
variability,
perhaps
due
to
unmeasured
factors,
for
example,
through
inclusion
of
a
study­
to­
study
random
effect.
Without
access
to
additional,
more
detailed
data,
we
cannot
assess
the
impact
of
bias
due
to
things
such
as
model
misspecification.
We
agree
that
the
available
data
would
not
support
further
evaluation
of
the
shape
of
the
dose­
response
relationship.
Our
use
of
a
linear
model
is
based
on
the
following
considerations:
1)
The
NRC's
2000
report
on
methylmercury
used
linear
model
results
for
deriving
BMDs,
and
cautioned
against
use
of
supralinear
models;
2)
the
Faroes
research
team
reported
that
K­
power
models
(
with
the
constraint
of
K>
1,
i.
e.
with
supralinearity
excluded)
fit
best
with
the
linear
specification,
i.
e.
K=
1;
3)
no
non­
linear
model
results
are
available
from
the
three
studies
(
except
for
Faroes
log
model),
and
raw
data
are
not
available
to
us
for
conducting
analysis
of
dose­
response
shape
or
other
issues;
4)
the
lower
end
of
exposures
in
the
Faroes
study
overlap
substantially
with
U.
S.
exposure
range,
indicating
that
there
is
minimal
extrapolation
involved
in
applying
the
observed
data
to
the
range
of
exposures
in
the
U.
S.;
and
5)
there
is
no
evidence,
given
the
information
at
hand,
that
would
support
use
of
an
alternative
model.

Are
the
methods
applied
in
this
study
appropriate
for
quantifying
IQ
decrements
in
the
range
of
current
U.
S.
mercury
exposures?
If
not,
what
methods
would
you
recommend?
34
I
believe
that
these
methods
are
acceptable.

What
are
the
overall
major
strengths
and
weaknesses
of
this
analysis?

The
major
weaknesses
are
in
the
data
available
for
analysis:
the
samples
are
small
for
this
sort
of
thing,
and
there
is
no
assurance
that
either
the
methods
of
measuring
function
or
the
methods
of
estimating
mercury
exposure
are
sufficiently
similar
to
treat
them
jointly.
One
unfortunate
result
is
that
EPA's
primary
interest
in
the
shape
of
the
dose­
response
relationship
cannot
be
evaluated
from
the
human
data;
there
is
simply
not
enough
statistical
power
for
a
head­
to­
head
comparison
of
the
linear
model
used
with
reasonable
alternative.

RESPONSE:
We
agree
that
the
data
are
limited
and
ideally
one
would
have
detailed
individual
level
data
available
from
multiple
different
studies.
We
do
not
believe
there
is
reason
to
expect
that
methods
of
measuring
function
or
methods
of
estimating
exposure
in
the
three
studies
differ
in
any
meaningful
way.
It
is
reasonable
to
assume
that
the
neurological
tests
in
each
study
were
administered
correctly
and
carefully,
following
the
instructions
in
the
test
manuals.
In
addition,
our
analysis
is
focused
on
an
outcome
(
IQ)
that
is
common
to
all
three
studies.
While
there
may
indeed
be
some
variation
in
the
measurement
techniques
used
to
assess
mercury
exposure
in
the
two
studies,
this
will
be
accommodated
in
our
model
as
part
of
study­
to­
study
variability.
Further,
the
biomarkers
for
mercury
exposure
used
in
these
studies
have
been
thoroughly
studied
and
evaluated
over
the
past
40
years,
and
are
considered
highly
reliable
and
well­
characterized.
Thus,
uncertainties
in
exposure
measurements
across
studies
can
be
presumed
to
be
small.
Uncertainties
in
exposure
data
for
these
studies
are
small
compared
to
epidemiologic
studies
of
other
pollutants
due
to
these
high­
quality
biomarkers.
See
above
response
for
discussion
of
dose­
response
shape.

The
major
strengths
are
in
the
careful,
detailed
analysis
and
presentation.
This
report
is
a
model
of
its
kind.

Are
all
of
the
essential
elements
included
in
the
final
report?

All
essential
elements
are
included.

Is
the
report
clear
and
well
written?
What
additional
documentation,
if
any,
do
you
feel
is
needed
to
ensure
transparency?

The
report
is
clear,
well
written,
and
transparent
for
anyone
with
sufficient
knowledge
of
statistical
methods.
No
additions
are
needed.

Specific
questions
Nine
specific
question
areas
are
listed
by
IEC.

This
analysis
focuses
on
IQ
as
the
neurodevelopmental
outcome
for
mercury
economic
benefits
analysis.
Are
there
other
neurodevelopmental
endpoints
for
mercury
that
could
35
be
quantified?
Are
there
advantages
and
disadvantages
of
using
IQ
to
represent
neurodevelopmental
effects
of
prenatal
mercury
exposure
in
addition
to
those
identified?

I
am
not
qualified
to
give
a
professional
opinion
on
this.

The
approach
to
quantifying
the
IQ
dose­
response
relationship
integrates
data
from
studies
conducted
in
the
Faroes,
Seychelles
and
New
Zealand.
Is
it
appropriate
to
combine
results
from
these
three
studies
for
this
analysis?
Do
differences
in
the
version
of
the
IQ
test
(
Wechsler
Intelligence
Scales
for
Children,
or
WISC)
administered
in
the
three
studies
raise
any
issues
that
are
of
concern
to
you?

I
would
not
have
combined
the
data
from
the
three
studies,
for
the
reasons
noted
above.
I
am
not
qualified
to
give
a
professional
opinion
about
the
different
versions
of
the
IQ
tests.

The
authors
of
this
analysis
had
to
select
dose­
response
coefficients
from
the
three
studies
for
use
in
their
statistical
modeling.
Please
comment
on
the
choice
of
doseresponse
coefficients
from
the
three
studies
for
use
in
this
analysis.
Are
there
alternate
coefficients
or
other
data
from
these
three
studies
that
should
be
used
in
place
or,
or
in
addition
to,
those
used
in
this
analysis?

I
do
not
know
of
any
defensible
and
feasible
alternatives
to
the
choices
Ryan
made.

This
analysis
generally
relies
on
coefficients
from
linear
dose­
response
models.
Given
the
available
information,
is
this
an
appropriate
approach,
and
applicable
to
the
full
range
of
exposures
experienced
by
the
U.
S.
population,
including
exposures
below
those
in
the
range
of
empirical
observation?

See
my
notes
above
about
the
linear
model.
Other
models
could
have
been
used,
and
some
would
surely
fit
as
well
as
the
linear
model.
the
data
here
do
not
,
and
could
not,
rule
out
a
threshold,
or
near­
threshold;
neither
do
they
rule
out
a
substantial
degree
of
supralinearity
in
(
say)
a
small
but
highly
sensitive
subpopulation

Full
scale
IQ
was
not
measured
in
the
Faroe
Islands
study;
however,
three
IQ
subtests
were
conducted.
At
the
request
of
EPA,
the
Faroe
Islands
research
team
conducted
a
statistical
analysis
to
estimate
a
dose­
response
relationship.
Is
the
rationale
for
extrapolating
full
scale
IQ
from
the
three
subtests
clearly
explained
and
justified?
Is
the
approach
to
estimating
a
full­
scale
IQ
dose­
response
relationship
for
the
Faroe
Islands
appropriate?

If
one
must
estimate
a
full­
scale
IQ,
this
is
as
good
as
any
other
method
I
can
think
of.
However,
I
question
the
wisdom
of
trying
to
combine
measures
of
three
rather
different
things.
There
were
reasons
for
developing
the
three
separate
36
scales
in
the
first
place,
and
I
believe
that
those
reasons
apply
here.
I
can
think
of
no
reason,
a
priori,
for
thinking
that
the
constructs
behind
each
of
the
sub­
scales
should
all
respond
in
the
same
way
to
mercury.
I
would
not
have
combined
these
measures,
now
would
I
have
combined
the
other
measures
that
are
discussed
here.

RESPONSE:
The
basic
nature
of
an
IQ
score
is
that
it
is
an
aggregate
index
of
several
different
subscales.
The
subscales
do
not
each
assess
a
single
independent
neurological
function,
but
represent
different
aspects
of
many
overlapping
functions
 
and
are
therefore
designed
to
be
complementary
of
one
another.
In
this
estimation
of
full
scale­
IQ
for
the
Faroes,
three
subscales
are
being
combined
 
many
fewer
than
the
ten
that
are
combined
when
a
complete
WISC
IQ
test
is
administered.
The
Bellinger
report
discusses
the
high
validity
of
IQ
predictions
based
on
the
subscales
administered
in
the
Faroes.
IQ
is
a
well­
established
metric
that
has
been
thoroughly
evaluated
and,
as
discussed
in
the
Bellinger
text,
has
a
demonstrated
relationship
to
long­
term
outcomes.
Further,
there
are
many
studies
establishing
that
IQ
is
sensitive
to
many
chemical
and
biological
insults.
We
agree
with
the
commenter
that
mercury
might
affect
the
underlying
constructs
differently.
We
have
expanded
the
text
noting
that
IQ
is
not
expected
to
capture
all
neurodevelopmental
effects
of
mercury.

Integrated
dose­
response
coefficients
are
estimated
first
using
only
the
IQ
dose­
response
coefficient
from
the
three
studies,
then
also
incorporating
data
for
other
neurodevelopmental
endpoints
(
e.
g.,
Boston
Naming
Test)
in
a
Bayesian
hierarchical
model.
Is
the
rationale
for
each
approach
clearly
explained?

I
am
not
qualified
to
give
a
professional
opinion
on
this.

Only
certain
non­
IQ
neurodevelopmental
endpoints
reported
in
the
Faroes,
New
Zealand
and
Seychelles
studies
could
be
included
in
the
Bayesian
model.
Is
the
rationale
for
selection
of
endpoints
for
inclusion/
exclusion
in
the
Bayesian
model
clear
and
reasonable?
Do
you
agree
with
the
rationale?

I
do
not
agree
with
the
selection
of
just
certain
endpoints
in
this
context.
However,
the
presentation
of
what
was
done
is
quite
clear.

RESPONSE:
The
original
stated
reasons
for
focusing
on
a
narrower
set
of
endpoints
in
the
main
analysis,
which
were
based
on
the
nature
of
the
data
available,
still
apply.
We
have
added
a
sensitivity
analysis
to
examine
inclusion
of
a
broader
set
of
endpoints
in
the
model.

In
order
to
combine
data
from
different
studies
and
different
neurodevelopmental
endpoints,
it
was
necessary
to
rescale
the
reported
dose­
response
coefficients.
Data
from
the
Faroes
is
converted
from
terms
of
cord
blood
mercury
to
hair
mercury.
All
endpoints
37
other
than
IQ
are
converted
to
the
IQ
scale.
Is
the
rescaling
of
the
coefficients
clearly
explained
and
appropriately
executed?

It
is
clearly
explained
and
appears
to
be
appropriately
executed.
However,
each
such
rescaling
inevitably
introduces
some
additional
uncertainty
about
eh
final
results,
sometimes
a
lot
of
uncertainty,
and
that
matter
should
have
been
discussed
at
some
length.

RESPONSE:
The
rescaling
used
in
this
analysis
is
a
composite
of
a
few
different
mathematical
conversions.
A
substantial
portion
of
the
conversion
is
a
simple
mathematical
conversion
of
a
number
expressed
on
one
scale
to
an
equivalent
value
on
a
different
scale;
the
Ryan
text
presents
this
in
the
form
of
multiplying
the
regression
equation
by
the
constant
15/
 .
This
adjustment
is
similar
to
converting
a
temperature
from
the
Fahrenheit
scale
to
the
Celsius
scale
­
it
does
not
add
to
uncertainty,
it
is
just
expressing
same
values
in
different
units.
A
second
aspect
of
rescaling
is
use
of
the
`
observed'
standard
deviation
for
each
cohort,
rather
than
the
`
expected'
or
population
standard
deviation,
which
does
introduce
some
uncertainty.
We
have
added
discussion
of
this
point
to
the
Ryan
report.
A
final
aspect
of
rescaling,
which
applies
only
to
the
Faroes,
is
converting
from
cord
blood
units
to
hair
units.
This
rescaling
is
discussed
in
the
text
in
some
detail.
While
this
aspect
does
add
some
uncertainty,
we
believe
this
is
relatively
small
since
the
value
used
in
the
rescaling
is
specific
to
the
population
to
which
it
is
applied
(
i.
e.
the
maternal
hair:
cord
blood
mercury
ratio
for
the
Faroes
cohort).

Please
comment
on
the
implementation
of
the
Bayesian
model
and
interpretation
of
the
results
of
this
model.
Is
this
portion
of
the
study
clearly
explained,
and
appropriately
executed
and
interpreted?
Has
all
relevant
information
been
considered
in
the
model?

I
believe
that
the
Bayesian
model
is
adequately
explained
and
that
it
was
appropriately
executed
and
interpreted.

Further
Comments:

Methylmercury
in
relatively
high
doses
is
beyond
question
a
serious
toxicant
for
the
developing
human
neurological
system,
as
shown
in
Japan
by
the
Minimoto
studies
and
elsewhere.
I
take
the
analysis
here
to
be
focused
on
possible
effects
at
lower,
and
much
more
common,
doses.
(
Throughout
this
report,
I
use
"
dose"
very
loosely,
to
mean
any
measure
of
the
amount
of
methylmercury
in
the
close
environment
of
the
fetus
or
infant).
Thus
major
attention
should
be
directed
at
the
shape
of
the
dose­
response
relationship
in
the
lower
dose
range.
This
involves
two
steps:
first,
determining
whether
there
is
evidence
of
any
effect
at
all
on
some
outcome;
then,
if
an
effect
is
demonstrated,
estimating
its
size
at
various
dose
levels.
Three
reports
in
the
scientific
literature
have
been
selected
for
detailed
study
because
of
their
"
careful
epidemiologic
designs
that
measured
a
variety
of
important
potential
confounders
such
as
maternal
age
and
38
education"
(
Ryan,
page
26).
There
are
other
important
strengths:
The
samples
in
each
are
adequate
for
reasonably
detailed
study,
and
some
of
the
original
data
could
be
used
in
lieu
of
the
published
summary
statistics.
Ryan
also
notes
some
weaknesses,
including
some
of
those
noted
below.

Ryan's
title
and
much
of
her
analysis
are
focused
on
IQ,
with
other
neurological
outcomes
(
such
as
other
aspects
of
cognition/
attainment,
motor
deficits,
and
attention/
behavior)
in
a
supporting
role.
Reasons
for
this
emphasis
on
IQ
are
not
given
despite
the
general
doubts
and
uncertainties
regarding
IQ
measurement
that
have
been
prominent
in
recent
years.

RESPONSE:
The
primary
objective
of
this
work
is
to
support
benefit­
cost
analysis
of
mercury
exposure
reductions.
The
focus
on
IQ
is
due
to
the
fact
that
there
are
established
methods
for
monetizing
IQ
decrements
in
benefit­
cost
analysis,
and
such
methods
do
not
exist
for
the
other
neurological
endpoints
studies
for
mercury.
The
report
by
David
Bellinger
explains
the
emphasis
on
IQ
and
the
associated
issues.
We
agree
that
other
outcomes
are
important.
Text
has
been
added
to
the
Ryan
and
Bellinger
reports
discussing
other
outputs
from
this
modeling
that
may
be
informative
for
quantifying
the
neurodevelopmental
effects
of
mercury
(
e.
g.
a
coefficient
for
the
Cognitive/
Achievement
domain),
even
though
they
can
not
be
used
in
benefit­
cost
analysis
at
the
current
time.
Providing
these
other
outputs
may
set
the
stage
for
related
economic
valuation
research
to
support
future
benefit­
cost
analyses.

Ryan
discusses
five
steps
she
used
to
select
the
data
for
analysis
from
a
larger
set.
While
other
investigators
(
including
myself)
might
have
made
other
choices,
each
of
her
decisions
seems
quite
defensible.

One
point
where
I
would
depart
from
Ryan's
analysis
is
the
decision
to
combine
the
measurements
of
two
examiners
in
one
of
the
studies.
Results
from
the
examiners
were
quite
clearly
incompatible,
so
the
combination
is
of
uncertain
meaning.
A
better
approach
would
have
been
to
present
the
data
from
just
the
one
examiner
considered
most
reliable
(
despite
the
loss
of
sample
size),
or
to
omit
that
item
entirely,
or
to
present
the
data
from
each
examiner
in
parallel
analyses
without
trying
to
combine
them.

RESPONSE:
We
believe
that
our
approach
of
using
all
of
the
available
data,
rather
than
using
data
for
just
one
examiner
or
the
other,
is
the
most
appropriate
approach.
We
do
not
have
the
information
that
would
be
needed
to
conclude
that
data
from
one
examiner
or
the
other
are
more
reliable.
We
do
agree
that
the
difference
in
results
between
the
two
examiners
may
be
important
and
have
added
sensitivity
analysis
that
looks
at
data
for
the
two
examiners
separately,
as
well
as
excluding
data
for
Similarities
entirely.
39
Despite
the
central
importance
of
the
shape
of
the
dose­
response
relationship
to
public
policy
that
may
be
based
on
this
analysis,
there
is
no
real
discussion
here
of
that
shape,
and
no
attempt
to
fit
and
compare
alternative
shapes.
To
compare
different
shapes
would
be
difficult
and
highly
uncertain
because
of
the
smallness
of
the
available
samples
for
this
purpose,
but
the
absence
of
discussion
of
this
central
issue
is
troubling.

RESPONSE:
We
have
expanded
our
discussion
on
this
point.
The
main
problem
is
that
we
cannot
undertake
any
analysis
based
on
alternative
dose
response
models
without
access
to
the
raw
data
or
going
to
the
original
investigators
and
asking
them
to
run
different
models.
See
additional
discussion
of
dose­
response
shape
above.

I
agree
with
Ryan's
decision
to
use
a
Bayesian
analysis.

I
am
distinctly
uncomfortable
with
the
structural
equation
analyses,
which
use
combined
exposure
metrics
to
estimate
a
hypothesized,
more
fundamental
"
latent
variable".
Neither
Ryan
nor
Budtz­
Jorgensen
discusses
at
length
the
numerous
assumptions
underlying
structural
equation
analysis
(
though
Budtz­
Jorgensen
lists
many
of
them),
nor
do
they
present
reasons
to
believe
that
structural
equations
are
an
appropriate
tool
here.
This
is
an
important
matter
because
of
two
things:

It
is
not
clear
that
mercury
was
the
only
toxicant
affecting
these
children,
so
that
a
latent
variable
would
be
an
abstraction
of
unknown
meaning,
and
It
is
not
clear
that
all
or
even
most
of
the
mercury
exposure
was
from
prenatal
exposures.
Surely
some
of
these
children
were
breast­
fed
by
mercury­
bearing
mothers,
and
surely
some
of
the
children
were
eating
fish
or
other
potentially
contaminated
foods
well
before
the
age
at
which
the
neurological
testing
was
done.

RESPONSE:
These
are
valid
concerns.
However,
they
are
not
specific
to
the
SEM
analysis
and
would
be
present
even
if
full
scale
IQ
had
been
measured.
Regarding
other
toxicant
exposures,
the
question
of
confounding
by
PCBs
in
the
Faroes
study
has
been
examined
by
the
study
investigators,
who
found
an
effect
for
mercury
independent
of
PCBs
(
see
Budtz­
Jorgensen
et
al.
2002).
This
was
also
addressed
by
the
NRC
panel,
which
concluded
that
data
on
the
effect
of
mercury
could
be
used
assuming
no
confounding.
In
addition,
blood
lead
levels
were
measured
in
Seychelles
(
at
least
initially)
and
were
low.
Regarding
potential
impacts
on
neurological
outcomes
from
postnatal/
early
childhood
exposures,
the
Faroes
group
found
little
association
between
the
children's
body
burden
at
age
7
and
performance
on
neurological
tests
at
the
same
age
(
see
Grandjean
et
al.
1999).

Budtz­
Jorgensen
E,
Keiding
N,
Grandjean
P,
Weihe
P.
Estimation
of
health
effects
of
prenatal
methylmercury
exposure
using
structural
equation
models.
Environ
Health.
2002
Oct
14;
1(
1):
2.

Grandjean
P,
Budtz­
Jorgensen
E,
White
RF,
Jorgensen
PJ,
Weihe
P,
Debes
F,
Keiding
N.
Methylmercury
exposure
biomarkers
as
indicators
of
neurotoxicity
in
children
aged
7
years.
Am
J
Epidemiol.
1999
Aug
1;
150(
3):
301­
5.
40
I
am
a
bit
concerned
that
some
of
the
analyses,
in
Budtz­
Jorgensen
as
well
as
Ryan,
were
driven
more
by
mathematical
and
statistical
convenience
than
by
what
the
problem
needs.
Examples:
Reducing
the
K­
power
model
to
K
=
1
because
that
fit
better
than
any
larger
K,
though
other
evidence
suggests
that
K
might
be
less
than
one.
Using
log­
transformed
data
because
the
data
become
more
like
the
standard
normal
distribution,
though
the
transformation
distorts
the
critical
dose­
response
relationship.
Omitting
one
child
with
very
high
estimated
exposure,
though
there
was
no
evidence
provided
to
suggest
that
that
particular
measurement
was
in
error
or
that
a
biologic
abnormality
might
account
for
the
measurement,
rather
than
a
failure
of
the
model
adopted.
Assuming
that
everything
is
linear
on
the
scales
adopted,
without
providing
physical
or
biological
evidence
of
that,
or
considering
the
high
frequency
of
non­
linear
(
including
numerous
supra­
linear)
dose
response
relationships.

RESPONSE:
Regarding
possible
values
of
K<
1,
this
analysis
followed
the
conclusions
of
the
NAS
methylmercury
panel
which
recommended
excluding
K
values
less
than
1.
In
addition,
no
data
are
available
from
the
New
Zealand
or
Seychelles
studies
for
models
with
K
values
less
than
1,
thus
such
models
could
not
be
considered
in
this
analysis.
Regarding
log­
transformed
data;
although
the
Faroe
Islands
investigators
have
found
that
such
data
provide
a
better
fit
than
untransformed
data,
the
Ryan
analysis
does
not
use
the
log­
transformed
data.
Regarding
the
omission
of
the
child
with
very
high
estimated
exposure,
this
analysis
followed
the
recommendation
of
the
NAS
panel
on
this
issue,
but
also
provides
a
sensitivity
analysis
with
inclusion
of
data
for
that
child.
The
neurological
effects
of
mercury
are
not
sufficiently
well
understood
at
a
mechanistic
level
to
provide
a
basis
for
assumptions
regarding
the
shape
of
the
dose­
response
function
or
other
aspects
of
the
analysis.

The
data
are
highly
heterogeneous,
coming
from
different
parts
of
the
world
with
different
populations
(
including
two
that
are
highly
inbred,
I
understand,
which
could
introduce
internal
correlations),
involve
different
neurobehavioral
domains
and
different
tests
(
that
were
in
fact
designed
to
measure
different
things)
administered
by
different
investigators
with
different
training
(
and
with
evidence
of
substantial
heterogeneity
in
the
one
place
where
a
comparison
is
possible),
with
different
details
of
exposure
and
slightly
different
ages
and
test
conditions.
To
assume
that
all
of
these
things
should
share
important
statistical
properties
seems
to
me
to
be
inappropriate.

RESPONSE:
See
responses
above
regarding
differences
in
the
neurological
tests,
examiners
and
measurement
methods.
It
is
true
that
the
studies
were
conducted
in
different
parts
of
the
world
with
different
populations
 
we
believe
this
is
a
strength
of
the
analysis
and
the
underlying
data.
The
important
points
are
that
all
three
studies
administered
the
WISC
(
albeit
in
different
variations);
that
this
is
a
well­
established
and
well­
characterized
test;
and
there
is
no
reason
to
assume
that
it
was
not
administered
appropriately
in
each
study.
The
alternatives
to
the
approach
we
have
taken
would
be
to
use
estimates
from
each
of
the
studies
independently,
or
to
combine
estimates
from
the
study
using
a
more
traditional
41
meta­
analysis
approach,
which
would
not
take
into
account
the
variability
from
the
studies
as
extensively
as
the
Bayesian
model.
We
believe
that
our
approach,
which
integrates
the
studies
and
accounts
for
variability
in
multiple
forms,
is
the
most
suitable
and
defensible
way
for
incorporating
all
available
information
into
the
analysis.

I
do
not
understand
why
differences
in
the
population
distribution
of
IQ
should
be
rescaled
away.
I
wonder
if
they
might
not
reflect
a
situation
that
should
be
preserved
in
the
analysis,
though
there
may
have
been
good
but
unstated
reasons
for
the
rescaling.
The
fact
that
rescaling
makes
them
a
bit
more
similar
seems
to
me
to
be
irrelevant.

RESPONSE:
The
rescaling
is
necessary
so
as
to
get
the
estimated
coefficients
for
different
tests
on
a
comparable
scale.
See
discussion
above.

Ryan
cites
some
sensitivity
analyses
that
seem
to
be
right
on
target.

The
similarities
in
results
in
the
tests
taken
one
test
and
one
location
at
a
time
are
encouraging
in
some
ways,
but
quite
puzzling
in
that
one
might
expect
mehg
to
act
differently
on
different
parts
of
the
nervous
system.
Thus
I
would
take
this,
overall,
to
be
a
sign
of
possible
weakness
in
the
data
unless
there
is
other
evidence
(
including
laboratory
experiments)
showing
that
various
functional
capabilities
do
in
fact
move
in
parallel
under
the
influence
of
mehg.
In
my
experience,
this
is
not
common
in
toxicology.
Despite
my
concerns
about
data
problems,
I
would
have
stopped
a
good
bit
sooner
than
these
investigators
in
trying
to
find
commonalities.

RESPONSE:
The
neuropsychological
tests
administered
in
the
three
studies
do
not
each
assess
a
single
circumscribed
neural
structure.
A
child
has
to
do
quite
a
few
things
reasonably
well
to
perform
well
on
each
test.
So
there
is
always
at
least
a
modest
correlation
between
scores
on
different
tests,
and
if
mercury
affects
performance
on
one
test,
it
is
likely
to
have
at
least
a
bit
of
an
effect
on
another
test.
By
aggregating
a
group
of
such
tests,
we
might
actually
end
up
with
a
more
sensitive
index
of
mercury's
effect
than
if
we
looked
at
each
test
separately
(
i.
e.,
a
little
effect
on
each
constituent
test
might
add
up
to
a
larger
effect
on
an
aggregated
index).
Our
main
analysis
uses
only
tests
in
the
cognitive
domain
for
precisely
the
reasons
outlined
in
this
comment.
All
the
endpoints
in
our
primary
analysis
have
been
identified
as
belonging
to
the
cognition/
achievement
domain
so
that
effects
are
expected
to
be
fairly
similar
for
all
of
them.
Our
sensitivity
analysis
using
all
endpoints
from
additional
domains
shows
more
variability,
consistent
with
the
expectations
of
this
comment,
and
we
agree
that
this
would
be
a
less
reliable
and
appropriate
basis
for
estimating
the
IQ
effect.

Ryan
notes
(
page
21)
that
there
was
not
enough
information
in
the
relatively
sparse
data
available
to
obtain
separate
estimates
of
the
study­
to­
study
and
endpoint­
to­
endpoint
42
variances.
This
could
happen
if
the
two
variance
components
were
highly
correlated,
and
I
would
encourage
her
to
check
on
this.
If
they
are
in
fact
highly
correlated,
further
study
to
determine
the
reasons
would
be
in
order.

RESPONSE:
The
two
variance
components
are
computed
using
very
different
aspects
of
the
data,
so
it
is
hard
to
see
that
they
would
be
correlated.
In
any
case,
as
discussed
in
the
report,
we
found
that
there
was
relatively
little
information
available
to
reliably
estimate
the
two
variance
components.
We
instead
performed
our
analysis
for
a
variety
of
different
values
for
the
ratio
between
the
two.

The
comment
at
the
bottom
of
Ryan,
page
25,
about
symmetry
in
mle­
based
confidence
intervals
puzzles
me,
because
there
are
well­
recognized
means
for
producing
asymmetric
mle
bounds
by
just
altering
the
probabilities
in
the
two
tails.

RESPONSE:
We
agree
that
there
are
means
for
producing
asymmetric
confidence
intervals
for
MLEs,
and
have
clarified
the
text
to
emphasize
that
"
standard"
MLE­
based
confidence
intervals
are
symmetric.
The
purpose
of
this
text
was
to
explain
to
readers
with
less
statistical
expertise
why
the
Bayesian
confidence
intervals
are
not
symmetric.

Figure
4
is
troubling,
because
it
shows
in
a
very
clear
way
that
most
of
the
differences
among
subjects
within
any
one
location
are
from
factors
unrelated
to
mehg
exposure.
The
remaining
signal
is
tiny,
and
could
be
accounted
for
by
biases
in
the
data,
including
biases
not
yet
imagined.
I
believe
from
the
evidence
here
that
one
would
find
similar
distributions
for
the
other
locations
and
other
tests.
It
would
not
take
much
to
make
these
data
points
random
around
the
horizontal
line.
I
recognize
that
the
point
of
this
whole
exercise
was,
in
a
sense,
to
try
to
identify
and
measure
a
weak
signal
in
the
presence
of
a
lot
of
noise,
and
I
think
that
the
evidence
overall
does
show
a
signal,
but
I
am
not
entirely
convinced.

RESPONSE:
We
agree
that
the
signal
is
small,
however,
as
noted
by
the
NRC,
the
epidemiology
studies
were
extremely
well
conducted,
including
extensive
information
on
potential
confounders
and
included
reliable
individual
measurements
of
exposure.
The
NRC
concluded
that
"
The
weight
of
the
evidence
of
developmental
neurotoxic
effects
from
exposure
to
MeHg
is
strong"
(
p.
326).
In
addition,
there
are
other
factors
that
could
have
biased
the
relationship
toward
the
null,
such
as
measurement
error.
