tongweb7部署应用后应用卡顿的参考思路(by lqw)
weixin_39938069 2024-07-23 10:33:01 阅读 55
文章目录
1.优化jvm和openfile相关参数2.排除网络延迟(仅供参考)3 查看服务器资源的使用情况3.1查看方式3.1.1cpu占用过高方法1:使用脚本show-busy-java-threads.sh进行分析方法2:使用jstack
3.1.2内存占用过高3.1.1线程阻塞
3 数据源原因4.应用代码层面5.查看并杀掉僵尸进程(仅供参考)
1.优化jvm和openfile相关参数
上图是控制台的jvm参数的配置位置,在这里请确保至少2048m(这里配的是xmx和xms参数)。
Openfile(专用机的话请无视这一步):
在服务器上的命令行工具输入:ulimit -a
建议调整为65535,可通过vi /etc/security/limits.conf,添加以下两行:
soft nofile 65535hard nofile 65535
修改完后通过linux系统命令ulimit a查看open files值生效后重启 TongWeb
2.排除网络延迟(仅供参考)
可访问服务器ip的电脑上cmd然后ping 域名或者服务器ip:
可访问服务器ip的电脑上cmd ,tarcert 域名或者服务器ip
如果应用本身有其他服务器或者ip交互的配置(例如应用a在服务器a上部署,配置的是b服务器上的数据库或者其他中间件),可以尝试使用traceroute 域名或者服务器ip,看看响应时间是否超时或者过长:
3 查看服务器资源的使用情况
3.1查看方式
可以在tongweb的控制台查看
重点关注以下内容:
1.内存使用率和cpu使用率
2.使用中堆内存和下面的线程相关的参数
或者在服务器命令行页面执行:
top
然后执行:
1
最后执行:
c
相当于执行了一次top命令,展示当前系统的资源使用情况:
在这里插入图片描述
之后根据查看的结果,主要有以下几种情况:cpu占用过高,内存占用过高,线程阻塞。
3.1.1cpu占用过高
方法1:使用脚本show-busy-java-threads.sh进行分析
以下是脚本的内容,可以自己建一个sh脚本,将内容复制粘贴进去,然后改名为show-busy-java-threads.sh(使用前提:服务器已配置jdk环境变量,支持例如jmap和jstack指令)
<code>#!/bin/bash
# @Function
# Find out the highest cpu consumed threads of java processes, and print the stack of these threads.
#
# @Usage
# $ ./show-busy-java-threads
#
# @online-doc https://github.com/oldratlee/useful-scripts/blob/master/docs/java.md#-show-busy-java-threads
# @author Jerry Lee (oldratlee at gmail dot com)
# @author superhj1987 (superhj1987 at 126 dot com)
readonly PROG="`basename $0`"code>
readonly -a COMMAND_LINE=("$0" "$@")
# Get current user name via whoami command
# See https://www.lifewire.com/current-linux-user-whoami-command-3867579
# Because if run command by `sudo -u`, env var $USER is not rewritten/correct, just inherited from outside!
readonly USER="`whoami`"code>
################################################################################
# util functions
################################################################################
# NOTE: $'foo' is the escape sequence syntax of bash
readonly ec=$'\033' # escape char
readonly eend=$'\033[0m' # escape end
colorEcho() {
local color=$1
shift
# if stdout is console, turn on color output.
[ -t 1 ] && echo "$ec[1;${ color}m$@$eend" || echo "$@"
}
colorPrint() {
local color=$1
shift
colorEcho "$color" "$@"
[ -n "$append_file" -a -w "$append_file" ] && echo "$@" >> "$append_file"
[ -n "$store_dir" -a -w "$store_dir" ] && echo "$@" >> "${ store_file_prefix}$PROG"
}
normalPrint() {
echo "$@"
[ -n "$append_file" -a -w "$append_file" ] && echo "$@" >> "$append_file"
[ -n "$store_dir" -a -w "$store_dir" ] && echo "$@" >> "${ store_file_prefix}$PROG"
}
redPrint() {
colorPrint 31 "$@"
}
greenPrint() {
colorPrint 32 "$@"
}
yellowPrint() {
colorPrint 33 "$@"
}
bluePrint() {
colorPrint 36 "$@"
}
die() {
redPrint "Error: $@" 1>&2
exit 1
}
logAndRun() {
echo "$@"
echo
"$@"
}
logAndCat() {
echo "$@"
echo
cat
}
usage() {
local -r exit_code="$1"code>
shift
[ -n "$exit_code" -a "$exit_code" != 0 ] && local -r out=/dev/stderr || local -r out=/dev/stdout
(( $# > 0 )) && { echo "$@"; echo; } > $out
> $out cat <<EOF
Usage: ${PROG} [OPTION]... [delay [count]]
Find out the highest cpu consumed threads of java processes,
and print the stack of these threads.
Example:
${PROG} # show busy java threads info
${PROG} 1 # update every 1 second, (stop by eg: CTRL+C)
${PROG} 3 10 # update every 3 seconds, update 10 times
Output control:
-p, --pid <java pid> find out the highest cpu consumed threads from
the specified java process.
default from all java process.
-c, --count <num> set the thread count to show, default is 5.
-a, --append-file <file> specifies the file to append output as log.
-S, --store-dir <dir> specifies the directory for storing
the intermediate files, and keep files.
default store intermediate files at tmp dir,
and auto remove after run. use this option to keep
files so as to review jstack/top/ps output later.
delay the delay between updates in seconds.
count the number of updates.
delay/count arguments imitates the style of
vmstat command.
jstack control:
-s, --jstack-path <path> specifies the path of jstack command.
-F, --force set jstack to force a thread dump. use when jstack
does not respond (process is hung).
-m, --mix-native-frames set jstack to print both java and native frames
(mixed mode).
-l, --lock-info set jstack with long listing.
prints additional information about locks.
CPU usage calculation control:
-d, --top-delay specifies the delay between top samples.
default is 0.5 (second). get thread cpu percentage
during this delay interval.
more info see top -d option. eg: -d 1 (1 second).
-P, --use-ps use ps command to find busy thread(cpu usage)
instead of top command.
default use top command, because cpu usage of
ps command is expressed as the percentage of
time spent running during the *entire lifetime*
of a process, this is not ideal in general.
Miscellaneous:
-h, --help display this help and exit.
EOF
exit $exit_code
}
################################################################################
# Check os support
################################################################################
uname | grep '^Linux' -q || die "$PROG only support Linux, not support `uname` yet!"
################################################################################
# parse options
################################################################################
# NOTE: ARGS can not be declared as readonly!!
# readonly declaration make exit code of assignment to be always 0, aka. the exit code of `getopt` in subshell is discarded.
# tested on bash 4.2.46
ARGS=`getopt -n "$PROG" -a -o p:c:a:s:S:Pd:Fmlh -l count:,pid:,append-file:,jstack-path:,store-dir:,use-ps,top-delay:,force,mix-native-frames,lock-info,help -- "$@"`
[ $? -ne 0 ] && { echo; usage 1; }
eval set -- "${ ARGS}"
while true; do
case "$1" in
-c|--count)
count="$2"code>
shift 2
;;
-p|--pid)
pid="$2"code>
shift 2
;;
-a|--append-file)
append_file="$2"code>
shift 2
;;
-s|--jstack-path)
jstack_path="$2"code>
shift 2
;;
-S|--store-dir)
store_dir="$2"code>
shift 2
;;
-P|--use-ps)
use_ps=true
shift
;;
-d|--top-delay)
top_delay="$2"code>
shift 2
;;
-F|--force)
force=-F
shift
;;
-m|--mix-native-frames)
mix_native_frames=-m
shift
;;
-l|--lock-info)
more_lock_info=-l
shift
;;
-h|--help)
usage
;;
--)
shift
break
;;
esac
done
count=${count:-5}
update_delay=${1:-0}
[ -z "$1" ] && update_count=1 || update_count=${2:-0}
(( update_count < 0 )) && update_count=0
top_delay=${top_delay:-0.5}
use_ps=${use_ps:-false}
# check the directory of append-file(-a) mode, create if not exsit.
if [ -n "$append_file" ]; then
if [ -e "$append_file" ]; then
[ -f "$append_file" ] || die "$append_file(specified by option -a, for storing run output files) exists but is not a file!"
[ -w "$append_file" ] || die "file $append_file(specified by option -a, for storing run output files) exists but is not writable!"
else
append_file_dir="$(dirname "$append_file")"code>
if [ -e "$append_file_dir" ]; then
[ -d "$append_file_dir" ] || die "directory $append_file_dir(specified by option -a, for storing run output files) exists but is not a directory!"
[ -w "$append_file_dir" ] || die "directory $append_file_dir(specified by option -a, for storing run output files) exists but is not writable!"
else
mkdir -p "$append_file_dir" || die "fail to create directory $append_file_dir(specified by option -a, for storing run output files)!"
fi
fi
fi
# check store directory(-S) mode, create directory if not exsit.
if [ -n "$store_dir" ]; then
if [ -e "$store_dir" ]; then
[ -d "$store_dir" ] || die "$store_dir(specified by option -S, for storing output files) exists but is not a directory!"
[ -w "$store_dir" ] || die "directory $store_dir(specified by option -S, for storing output files) exists but is not writable!"
else
mkdir -p "$store_dir" || die "fail to create directory $store_dir(specified by option -S, for storing output files)!"
fi
fi
################################################################################
# check the existence of jstack command
################################################################################
if [ -n "$jstack_path" ]; then
[ -f "$jstack_path" ] || die "$jstack_path is NOT found!"
[ -x "$jstack_path" ] || die "$jstack_path is NOT executalbe!"
elif which jstack &> /dev/null; then
jstack_path="`which jstack`"code>
else
[ -n "$JAVA_HOME" ] || die "jstack not found on PATH and No JAVA_HOME setting! Use -s option set jstack path manually."
[ -f "$JAVA_HOME/bin/jstack" ] || die "jstack not found on PATH and \$JAVA_HOME/bin/jstack($JAVA_HOME/bin/jstack) file does NOT exists! Use -s option set jstack path manually."
[ -x "$JAVA_HOME/bin/jstack" ] || die "jstack not found on PATH and \$JAVA_HOME/bin/jstack($JAVA_HOME/bin/jstack) is NOT executalbe! Use -s option set jstack path manually."
jstack_path="$JAVA_HOME/bin/jstack"code>
fi
################################################################################
# biz logic
################################################################################
readonly run_timestamp="`date "+%Y-%m-%d_%H:%M:%S.%N"`"code>
readonly uuid="${ PROG}_${ run_timestamp}_${ RANDOM}_$$"code>
readonly tmp_store_dir="/tmp/${ uuid}"code>
if [ -n "$store_dir" ]; then
readonly store_file_prefix="$store_dir/${ run_timestamp}_"code>
else
readonly store_file_prefix="$tmp_store_dir/${ run_timestamp}_"code>
fi
mkdir -p "$tmp_store_dir"
cleanupWhenExit() {
rm -rf "$tmp_store_dir" &> /dev/null
}
trap "cleanupWhenExit" EXIT
headInfo() {
colorEcho "0;34;42" ================================================================================
echo "$(date "+%Y-%m-%d %H:%M:%S.%N") [$(( i + 1 ))/$update_count]: ${ COMMAND_LINE[@]}"
colorEcho "0;34;42" ================================================================================
echo
}
if [ -n "${ pid}" ]; then
readonly ps_process_select_options="-p $pid"code>
else
readonly ps_process_select_options="-C java -C jsvc"code>
fi
# output field: pid, thread id(lwp), pcpu, user
# order by pcpu(percentage of cpu usage)
findBusyJavaThreadsByPs() {
# 1. sort by %cpu by ps option `--sort -pcpu`
# 2. use wide output(unlimited width) by ps option `-ww`
# avoid trunk user column to username_fo+ or $uid alike
local -a ps_cmd_line=(ps $ps_process_select_options -wwLo pid,lwp,pcpu,user --sort -pcpu --no-headers)
local -r ps_out="$("${ps_cmd_line[@]}")"code>
if [ -n "$store_dir" ]; then
echo "$ps_out" | logAndCat "${ ps_cmd_line[@]}" > "${ store_file_prefix}$(( i + 1 ))_ps"
fi
echo "$ps_out" | head -n "${ count}"
}
# top with output field: thread id, %cpu
__top_threadId_cpu() {
# 1. sort by %cpu by top option `-o %CPU`
# unfortunately, top version 3.2 does not support -o option(supports from top version 3.3+),
# use
# HOME="$tmp_store_dir" top -H -b -n 1code>
# combined
# sort
# instead of
# HOME="$tmp_store_dir" top -H -b -n 1 -o '%CPU'code>
# 2. change HOME env var when run top,
# so as to prevent top command output format being change by .toprc user config file unexpectedly
# 3. use option `-d 0.5`(update interval 0.5 second) and `-n 2`(update 2 times),
# and use second time update data to get cpu percentage of thread in 0.5 second interval
# 4. top v3.3, there is 1 black line between 2 update;
# but top v3.2, there is 2 blank lines between 2 update!
local -a top_cmd_line=(top -H -b -d $top_delay -n 2)
local -r top_out=$(HOME="$tmp_store_dir" "${ top_cmd_line[@]}")code>
if [ -n "$store_dir" ]; then
echo "$top_out" | logAndCat "${ top_cmd_line[@]}" > "${ store_file_prefix}$(( i + 1 ))_top"
fi
echo "$top_out" |
awk 'BEGIN { blockIndex = 0; currentLineHasText = 0; prevLineHasText = 0; } {
currentLineHasText = ($0 != "")
if (prevLineHasText && !currentLineHasText)
blockIndex++ # from text line to empty line, increase block index
if (blockIndex == 3 && ($NF == "java" || $NF == "jsvc")) # $NF(last field) is command field
# only print 4th text block(blockIndex == 3), aka. process info of second top update
print $1 " " $9 # $1 is thread id field, $9 is %cpu field
prevLineHasText = currentLineHasText # update prevLineHasText
}' | sort -k2,2nr
}
__complete_pid_user_by_ps() {
# ps output field: pid, thread id(lwp), user
local -a ps_cmd_line=(ps $ps_process_select_options -wwLo pid,lwp,user --no-headers)
local -r ps_out="$("${ps_cmd_line[@]}")"code>
if [ -n "$store_dir" ]; then
echo "$ps_out" | logAndCat "${ ps_cmd_line[@]}" > "${ store_file_prefix}$(( i + 1 ))_ps"
fi
local idx=0
local -a line
while IFS=" " read -a line ; docode>
(( idx < count )) || break
local threadId="${ line[0]}"code>
local pcpu="${ line[1]}"code>
# output field: pid, threadId, pcpu, user
local output_fields="$( echo "$ps_out" |code>
awk -v "threadId=$threadId" -v "pcpu=$pcpu" '$2==threadId {
printf "%s %s %s %s\n", $1, threadId, pcpu, $3; exit
}' )"
if [ -n "$output_fields" ]; then
(( idx++ ))
echo "$output_fields"
fi
done
}
# output format is same as function findBusyJavaThreadsByPs
findBusyJavaThreadsByTop() {
__top_threadId_cpu | __complete_pid_user_by_ps
}
printStackOfThreads() {
local -a line
local idx=0
while IFS=" " read -a line ; docode>
local pid="${ line[0]}"code>
local threadId="${ line[1]}"code>
local threadId0x="0x`printf %x ${ threadId}`"code>
local pcpu="${ line[2]}"code>
local user="${ line[3]}"code>
(( idx++ ))
local jstackFile="${ store_file_prefix}$(( i + 1 ))_jstack_${ pid}"code>
[ -f "${ jstackFile}" ] || {
local -a jstack_cmd_line=( "$jstack_path" ${force} $mix_native_frames $more_lock_info ${pid} )
if [ "${ user}" == "${ USER}" ]; then
# run without sudo, when java process user is current user
logAndRun "${ jstack_cmd_line[@]}" > ${jstackFile}
elif [ $UID == 0 ]; then
# if java process user is not current user, must run jstack with sudo
logAndRun sudo -u "${ user}" "${ jstack_cmd_line[@]}" > ${jstackFile}
else
# current user is not root user, so can not run with sudo; print error message and rerun suggestion
redPrint "[$idx] Fail to jstack busy(${ pcpu}%) thread(${ threadId}/${ threadId0x}) stack of java process(${ pid}) under user(${ user})."
redPrint "User of java process($user) is not current user($USER), need sudo to rerun:"
yellowPrint " sudo ${ COMMAND_LINE[@]}"
normalPrint
continue
fi || {
redPrint "[$idx] Fail to jstack busy(${ pcpu}%) thread(${ threadId}/${ threadId0x}) stack of java process(${ pid}) under user(${ user})."
normalPrint
rm "${ jstackFile}" &> /dev/null
continue
}
}
bluePrint "[$idx] Busy(${ pcpu}%) thread(${ threadId}/${ threadId0x}) stack of java process(${ pid}) under user(${ user}):"
if [ -n "$mix_native_frames" ]; then
local sed_script="/--------------- $threadId ---------------/,/^---------------/ {
/--------------- $threadId ---------------/b # skip first separator line
/^---------------/d # delete second separator line
p
}"code>
elif [ -n "$force" ]; then
local sed_script="/^Thread ${ threadId}:/,/^$/ {
/^$/d; p # delete end separator line
}"code>
else
local sed_script="/ nid=${ threadId0x} /,/^$/ {
/^$/d; p # delete end separator line
}"code>
fi
{
sed "$sed_script" -n ${jstackFile}
echo
} | tee ${append_file:+-a "$append_file"} ${store_dir:+-a "${ store_file_prefix}$PROG"}
done
}
################################################################################
# Main
################################################################################
main() {
local i
# if update_count <= 0, infinite loop till user interrupted (eg: CTRL+C)
for (( i = 0; update_count <= 0 || i < update_count; ++i )); do
(( i > 0 )) && sleep "$update_delay"
[ -n "$append_file" -o -n "$store_dir" ] && headInfo | tee ${append_file:+-a "$append_file"} ${store_dir:+-a "${ store_file_prefix}$PROG"} > /dev/null
(( update_count != 1 )) && headInfo
if $use_ps; then
findBusyJavaThreadsByPs
else
findBusyJavaThreadsByTop
fi | printStackOfThreads
done
}
main
使用指引:
先给该文件授权:
chmod +x show-busy-java-threads.sh 或
chmod 777 show-busy-java-threads.sh
然后执行:
./ show-busy-java-threads.sh -a /opt/xxx.txt 或者sh show-busy-java-threads.sh -a /opt/xxx.txt
这个指令会将查出来的结果输出位xx.txt文件输出到opt目录下,方便后面追踪。
主要看 Busy里的占比,看看有没有占比比较高的,以及busy里较高的日志信息,有没有相关提示,详细可参考:
how-busy-java-threads脚本初体验,快速排查Java的CPU性能问题
方法2:使用jstack
参考:
记一次java程序CPU占用过高问题排查
大致思路:
top查一下系统资源占用情况,找出pid
或者也可以跟之前一样,通过监听9060端口和jps指令确认tongweb进程,拿到tongweb的pid
通过这个指令看一下占用和tid:
<code> ps -mp pid -o THREAD,tid,time
找到tid后,通过指令得到16位进制的数字(方便之后使用)
<code>printf “%x\n” tid
之后拿找到的pid和tid转换过来的数字,进行精确定位(红色框第一个是pid,第二个是tid转换过来的数字):
<code>jstack pid |grep tid转换的十六位进制数字
其中显示出了较为详细的代码信息
另外为了方便追踪,也可以执行:
<code>jstack pid |grep tid转换的数字 > /opt/jstack.txt
上面显示的内容会存入到jstack.txt里面,然后把这个文件交给开发,让开发来排查一下是否是应用代码层面的问题。
3.1.2内存占用过高
这种现象通常是TongWeb控制台和应用访部都很慢,日志中有“OutOfMemoryError:Java
heap space”,就跟前面说的“死”一样,但进程还在。通过查看bin下gc.log日志,或通过jstat
命令,查看内存是否占满,Full GC是否频繁。
首先,请参考tw7配置gc日志和阈值
可执行
<code> jstat gcutil 进程号 2000 20
当 确认内存满了,执行以下操作:
(1) 要求出现OutOfMemoryError:Java heap space时不要重启Java进程,保留进程继续执行如
下操作。
(2) 利用JDK的jps –v命令查出Java的进程号(或者查看一下tongwbe的进程id)。
(3) 通过jmap –histo > mem.txt 打出文本日志,生成过程很快,文件很小。
(4) 采用jmap生成完整的内存镜像文件
<code>jmap -dump:format=b,file=/opt/heap.hprof <PID>
或
jmap dump:live,format=b,file=heap.bin <PID>
在当前执行命令目录下生成,如果内存设为2G,则生成的内存镜像文件也有2G。
(5) 生成的mem.txt文件可以用文本工具打开直接看,内存镜像文件可以用MemoryAnalyzer内
存分析工具分析。下载地址如: http://www.eclipse.org/mat。 分析这些文件需要用大内存机
器才行,建议用64位windows机器,安装64位MemoryAnalyzer软件,物理内存至少为内存镜
像文件的3倍。
MemoryAnalyzer使用指引参考:
【JVM】日志分析工具一Memory Analyzer Mat介绍和使用
tongweb生成hprof文件并结合Memory Analyzer Mat分析内存溢出
3.1.1线程阻塞
这种现象通常表现为CPU使用不高,TongWeb控制台访问正常,但应用所有页面访问都慢,这种情况通常是应用的http线程池出现阻塞导致的。
出现这种问题时可使用JDK的jstack命令打出线程栈来分析。 如:jstack <java进程id> > log.txt, 输出到指定文件。
重点看:
1.是不是BLOCKED线程很多,这些线程是不是lock在同一地址上, 偶尔几个BLOCKED线程对系统不影响。
如果多次出现这种提示,最好将打印出的文件发给开发来进行排查。
2.有时不一定是BLOCKED,表现为控制台日志一直没有实时滚动,看控制台的线程使用率已满,但是jstack日志信息里没有BLOCKED。
查看jstack日志,有多个 TIMED WAITING (parking)的话,可以看看里面是否有例如数据源相关的代码(ctrl+f搜关键字看看多不多),每段线程下的描述都可以提取关键字进行搜索然后分析:
3 数据源原因
这种现象通常表现为CPU使用不高,TongWeb控制台访问正常,但应用跟数据库无关的页面访问正常,跟数据库有关的页面访问慢。这种分种情况:
(1). 数据源连接池占满,TongWeb的server.log中可以看到数据源占满的日志(开源和
TongWeb数据源都会有),通过jstack可以看到线程阻塞在数据源上。可能是连接数过小引起的,若加大后还出现就有可能是存在连接泄露问题了,找应用代码泄露的地方改掉。 改不了应用代码就把“泄漏超时”“泄漏回收”同时设置上,这样到达超时时间后,强制回收数据库连接。开源连接池也有这参数。
(2).查看慢sql日志,优化慢sql语句。
4.应用代码层面
1.可记录下应用卡顿的场景,看看应用对应场景的代码,是否设置超时时间(设置过长的超时时间,或者没有设置超时时间,都有可能导致卡顿甚至没有响应)。
2.应用代码里有System.exit(0)代码(找出应用代码用System.exit的地方并删掉,或者启动参数加入-Djava.security.manager)。
3.使用jstack指令(前面章节有介绍)进行排查。
4.查看日志,例如tongweb的安装目录的logs目录下的日志文件(例如:server.log)。
5.看看应用是否存在重复的类,冲突的jar包和代码。
5.查看并杀掉僵尸进程(仅供参考)
参考:【Linux】如何杀掉defunct进程-僵尸进程
声明
本文内容仅代表作者观点,或转载于其他网站,本站不以此文作为商业用途
如有涉及侵权,请联系本站进行删除
转载本站原创文章,请注明来源及作者。