RELIABILITY 101
F
undamental
reliability begins to take shape when a work community shifts its attitude of
maintenance from "fix it when it breaks" toward something that is an
ongoing and important function. It starts with two simple questions. In your
work area:
-
Who
is responsible for reliability?
-
Who
is responsible for production output?
The
answer should be
everyone
. This book has advocated the use of
cross-functional teams and the development of active learning and multiskills
for everyone. To sustain high OEE numbers, employees in the different
manufacturing functions need to have basic understanding of concerns and
aspects of the other functions before making decisions on priorities about
their own functions.
7.1 Fundamental Reliability
Fundamental
reliability is most powerful if plant managers, production and operation
managers, maintenance managers and production supervisors champion it. This
leadership team for the area must take stewardship of the overall system. It
must balance production and production capability over the long term. They
should build on the idea that "If you are not promoting reliability, then
you are promoting failures."
Sharing
a set of basic concepts and definitions will provide a platform for communication
so that everyone can contribute to reliability, availability, and
maintainability (RAM).
Because
everyone is responsible for RAM, the expectations of that responsibility should
be understood. These are:
-
Eliminate
failures.
-
When
failure occurs, reduce the impact or consequence of the failure.
-
When
failure occurs, use both short-term and long-term considerations to
optimize repair, and restore the systems.
The
first of these is most important. If you could eliminate failures, then the
other two would not be necessary. Optimizing repair for both the short term and
the long term means that the fastest solution is not necessarily the best
solution. Take the appropriate time to collect data, confirm root cause, and
allow good workmanship so that the event is correctly understood. Prevent
failures from reoccurring. All too often, the fastest fix requires a second
time, or more, to correct the problem. A fast fix doesn’t promote learning from
mistakes.
Two
of the most important steps in promoting RAM are collecting and analyzing data.
These steps provide good information for problem solving and setting strategy.
The data does not consist only of measurements and numbers, but also includes
samples, diagrams, drawings, pictures, observations, procedures, and practices.
Just as modern forensic study reconstructs a crime scene, many root cause
sources of equipment failures can be uncovered by scientifically studying the
failed part, then reconstructing the failure circumstances. This process can be
the most exciting aspect of reliability: carefully determining the root cause
of a chronic equipment failure, creatively injecting a simple solution, and monitoring
results that show significant improvement on area throughput. Nearly everyone
enjoys playing detective when they read mysteries. The same holds true for
equipment reliability. Everyone should participate in solving the mystery. All
factors should be suspected before eliminating them from being the root source
problem. To set the investigative stage, let’s set some common guidelines that
we can use to identify these root source problems.
Begin
by obtaining OEE data and categorizing the losses and summarizing event
details, as outlined in chapter 2. Rank the loss events using the value fulcrum
suggested in chapter 5. As appropriate, factor in
actual
business parameters (e.g., known costs) for various loss events. Use
cross-functional teams to work on the top two or three items. Do not allow
discretionary resources of people, time, or money to be directed at anything
other than these top items. This focus will provide the greatest impact and the
fastest rate of improvement for your area. Everyone wins when area
effectiveness jumps significantly.
When
resources can’t be used directly on cross-functional OEE teams, focus them on
developing best practice methods and procedures. This step could start with
clarifying procedures and set points, collecting and categorizing all downtime
minutes, and reviewing maintenance strategies. It would also include applying
simple Predictive Maintenance techniques and statistical process control (SPC),
then using these tools for proactive, conditioned-based maintenance (CBM).
Assume
that equipment reliability of a certain subassembly is designated one of the
top three items. Everyone in the community should be focused on solving this
item. The necessary resources should be extended with priority; other
discretionary activities should be subordinated to this investigation.
Following the Theory of Constraints (TOC) steps modeled for shutdowns in
Chapter 6, we have
identified
an important parameter
for our attention. We will
exploit
the study and root cause
analysis of this limiter. Meanwhile, we will
subordinate
other
activities from distracting our resources. Once the root cause is determined,
we will
elevate
the changes to eliminate the problem,
using designed experiments and monitoring the results to prove that our
analysis and action have addressed the problem. With this proof in hand, we
will
go back
and tackle the next most important OEE
limiter.
One
of the most successful work centers where I worked used this exact approach.
Over a three-year period, the work center showed dramatic improvements on an
established system, having 70 percent throughput improvement with 12 percent
fewer employees.
Once
a specific subassembly is identified as a top priority, all pertinent data
about downtimes, frequencies, and event details is required to begin root cause
analysis. The work community's ability to collect and maintain a good database
on all downtimes is very important. Good practices can prevent countless hours
being spent guessing and verifying data after the fact. Good practices can
reduce the risk of bad data being used which leads to incorrect causes. A good
database will help you focus on the right information and allow your analysis
to zoom in on the root cause.
These
are among the reasons why a good database is valuable to your company. Everyone
must understand this aspect, accept the responsibility, and maintain the
discipline to report and record all the necessary data for each interruption.
As the cost of personal computers spiral down and the need rises for vast
amounts of information to be collected, organized, Pareto charted, analyzed,
graphed, reported, and shared, every company should strongly consider using a
computerized maintenance management system (CMMS). Good databases sorted
quickly via computer systems are very useful for reliability studies.
With
data collection in mind, let’s review some key reliability definitions.