JadePeng的技术笔记本

教程 —— 如何在自己的应用集成superset

发表于 2019-09-17 更新于 2021-07-12 分类于博客， jqpeng
本文字数： 4.7k 阅读时长 ≈ 4 分钟

文章作者:jqpeng
原文链接: 教程 —— 如何在自己的应用集成superset

Superset 是apache的一个孵化项目，定位为一款现代的，准商用BI系统

superset

Superset 是apache的一个孵化项目，定位为一款现代的，准商用BI系统。

Superset（Caravel）是由Airbnb（知名在线房屋短租公司）开源的数据分析与可视化平台（曾用名Caravel、Panoramix），该工具主要特点是可自助分析、自定义仪表盘、分析结果可视化（导出）、用户/角色权限控制，还集成了一个SQL编辑器，可以进行SQL编辑查询等。

通过superset，可以制作漂亮的统计图表。

预览

superset安装

我们这里直接使用docker

git clone https://github.com/apache/incubator-superset/
cd incubator-superset/contrib/docker
# prefix with SUPERSET_LOAD_EXAMPLES=yes to load examples:
docker-compose run --rm superset ./docker-init.sh
# you can run this command everytime you need to start superset now:
docker-compose up

等构建完成后，访问 http://localhost:8088 即可。

想要在自己的应用集成，首先要解决认证

superset 认证分析

superset基于flask-appbuilder开发，security基于flask_appbuilder.security，翻阅其代码，

找到入口： superset/__init__.py:

custom_sm = app.config.get('CUSTOM_SECURITY_MANAGER') or SupersetSecurityManager
if not issubclass(custom_sm, SupersetSecurityManager):
    raise Exception(
        """Your CUSTOM_SECURITY_MANAGER must now extend SupersetSecurityManager,
         not FAB's security manager.
         See [4565] in UPDATING.md""")

appbuilder = AppBuilder(
    app,
    db.session,
    base_template='superset/base.html',
    indexview=MyIndexView,
    security_manager_class=custom_sm,
    update_perms=get_update_perms_flag(),
)

security_manager = appbuilder.sm

默认使用SupersetSecurityManager,继承自SecurityManager：

class SupersetSecurityManager(SecurityManager):

    def get_schema_perm(self, database, schema):
        if schema:
            return '[{}].[{}]'.format(database, schema)

    def can_access(self, permission_name, view_name):
        """Protecting from has_access failing from missing perms/view"""
        user = g.user
        if user.is_anonymous:
            return self.is_item_public(permission_name, view_name)
        return self._has_view_access(user, permission_name, view_name)        ...

我们再来看SecurityManager及父类，发现，登录是通过auth_view来控制的，默认是AUTH_DB，也就是AuthDBView。

""" Override if you want your own Authentication LDAP view """
      authdbview = AuthDBView      if self.auth_type == AUTH_DB:
            self.user_view = self.userdbmodelview
            self.auth_view = self.authdbview()
 @property
    def get_url_for_login(self):
        return url_for('%s.%s' % (self.sm.auth_view.endpoint, 'login'))

再来看authdbview：

class AuthDBView(AuthView):
    login_template = 'appbuilder/general/security/login_db.html'

    @expose('/login/', methods=['GET', 'POST'])
    def login(self):
        if g.user is not None and g.user.is_authenticated:
            return redirect(self.appbuilder.get_url_for_index)
        form = LoginForm_db()
        if form.validate_on_submit():
            user = self.appbuilder.sm.auth_user_db(form.username.data, form.password.data)
            if not user:
                flash(as_unicode(self.invalid_login_message), 'warning')
                return redirect(self.appbuilder.get_url_for_login)
            login_user(user, remember=False)
            return redirect(self.appbuilder.get_url_for_index)
        return self.render_template(self.login_template,
                               title=self.title,
                               form=form,
                               appbuilder=self.appbuilder)

对外提供’/login/‘接口，读取HTTP POST里的用户名，密码，然后调用auth_user_db验证，验证通过调用login_user生成认证信息。

因此，我们可以自定义AuthDBView，改为从我们自己的应用认证即可。

使用jwt来验证superset

自定义CustomAuthDBView，继承自AuthDBView，登录时可以通过cookie或者url参数传入jwt token，然后验证通过的话，自动登录，。

import jwt
import json
class CustomAuthDBView(AuthDBView):
    login_template = 'appbuilder/general/security/login_db.html'

    @expose('/login/', methods=['GET', 'POST'])
    def login(self):
        token = request.args.get('token')
        if not token:
            token = request.cookies.get('access_token')
        if token is not None:
            jwt_payload = jwt.decode(token,'secret',algorithms=['RS256'])
            user_name = jwt_payload.get("user_name")
            user = self.appbuilder.sm.find_user(username=user_name)
            if not user:
               role_admin = self.appbuilder.sm.find_role('Admin')
               user = self.appbuilder.sm.add_user(user_name, user_name, 'aimind', user_name + "@aimind.com", role_admin, password = "aimind" + user_name)
            if user:
                login_user(user, remember=False)
                redirect_url = request.args.get('redirect')
                if not redirect_url:
                    redirect_url = self.appbuilder.get_url_for_index
                return redirect(redirect_url)
            else:
                return super(CustomAuthDBView,self).login()
        else:
            flash('Unable to auto login', 'warning')
            return super(CustomAuthDBView,self).login()

如果用户不存在，通过self.appbuilder.sm.add_user自动添加用户。

然后再引入这个CustomAuthDBView，

class CustomSecurityManager(SupersetSecurityManager):
    authdbview = CustomAuthDBView

最后，再引入这个CustomSecurityManager,在superset_config.py 里增加：

from aimind_security import CustomSecurityManager
CUSTOM_SECURITY_MANAGER = CustomSecurityManager

在应用里集成superset

集成就简单了，访问，’SUPER_SET_URL/login/?token=jwt_token’ 即可，可以通过iframe无缝集成。

知识图谱推理与实践(3) -- jena自定义builtin

发表于 2019-09-12 更新于 2021-07-12 分类于博客， jqpeng
本文字数： 12k 阅读时长 ≈ 11 分钟

文章作者:jqpeng
原文链接: 知识图谱推理与实践(3) – jena自定义builtin

在第2篇里，介绍了jena的The general purpose rule engine（通用规则引擎）及其使用，本篇继续探究，如何自定义builtin。

builtin介绍

先回顾builtin为何物，官方叫Builtin primitives,可以理解为内置函数、内置指令，可以返回true或者false用来检验rule是否匹配，官方包含如下的primitives

Builtin	Operations
isLiteral(?x) notLiteral(?x) isFunctor(?x) notFunctor(?x) isBNode(?x) notBNode(?x)	Test whether the single argument is or is not a literal, a functor-valued

      literal or a blank-node, respectively. |

| bound(?x…) unbound(?x..) | Test if all of the arguments are bound (not bound) variables |
| equal(?x,?y) notEqual(?x,?y) | Test if x=y (or x != y). The equality test is semantic equality so that,
for example, the xsd:int 1 and the xsd:decimal 1 would test equal. |
| lessThan(?x, ?y), greaterThan(?x, ?y)
le(?x, ?y), ge(?x, ?y) | Test if x is <, >, <= or >= y. Only passes if both x and y
are numbers or time instants (can be integer or floating point or XSDDateTime). |
| sum(?a, ?b, ?c)
addOne(?a, ?c)
difference(?a, ?b, ?c)
min(?a, ?b, ?c)
max(?a, ?b, ?c)
product(?a, ?b, ?c)
quotient(?a, ?b, ?c) | Sets c to be (a+b), (a+1) (a-b), min(a,b), max(a,b), (a
b), (a/b). Note that these do not run backwards, if in
sum a and c are bound and b is unbound then the test will
fail rather than bind b to (c-a). This could be fixed.
|
| strConcat(?a1, .. ?an, ?t)
uriConcat(?a1, .. ?an, ?t) | Concatenates the lexical form of all the arguments except the last, then
binds the last argument to a plain literal (strConcat) or a URI node
(uriConcat) with that lexical form. In both cases if an argument node
is a URI node the URI will be used as the lexical form. |
| regex(?t, ?p)
regex(?t, ?p, ?m1, .. ?mn) | Matches the lexical form of a literal (?t) against a regular expression
pattern given by another literal (?p). If the match succeeds, and if
there are any additional arguments then it will bind the first n capture
groups to the arguments ?m1 to ?mn. The regular expression pattern syntax
is that provided by java.util.regex. Note that the capture groups are
numbered from 1 and the first capture group will be bound to ?m1, we
ignore the implicit capture group 0 which corresponds to the entire matched
string. So for example

regexp(‘foo bar’, ‘(.) (.
)’, ?m1, ?m2)

will bind
m1 to
"foo" and
m2 to
"bar".
|
| now(?x) | Binds ?x to an xsd:dateTime value corresponding to the current time. |
| makeTemp(?x) | Binds ?x to a newly created blank node. |
| makeInstance(?x, ?p, ?v)

makeInstance(?x, ?p, ?t, ?v) | Binds ?v to be a blank node which is asserted as the value of the ?p property
on resource ?x and optionally has type ?t. Multiple calls with the same
arguments will return the same blank node each time - thus allowing this
call to be used in backward rules. |
| makeSkolem(?x, ?v1, … ?vn) | Binds ?x to be a blank node. The blank node is generated based on the values
of the remain ?vi arguments, so the same combination of arguments will
generate the same bNode. |
| noValue(?x, ?p)

noValue(?x ?p ?v) | True if there is no known triple (x, p, ) or (x, p, v) in the model or
the explicit forward deductions so far. |
| remove(n, …)

drop(n, …) | Remove the statement (triple) which caused the n’th body term of this (forward-only)
rule to match. Remove will propagate the change to other consequent rules
including the firing rule (which must thus be guarded by some other clauses).
In particular, if the removed statement (triple) appears in the body
of a rule that has already fired, the consequences of such rule are retracted
from the deducted model. Drop will silently remove the triple(s) from
the graph but not fire any rules as a consequence. These are clearly
non-monotonic operations and, in particular, the behaviour of a rule
set in which different rules both drop and create the same triple(s)
is undefined. |
| isDType(?l, ?t) notDType(?l, ?t) | Tests if literal ?l is (or is not) an instance of the datatype defined
by resource ?t. |
| print(?x, …) | Print (to standard out) a representation of each argument. This is useful
for debugging rather than serious IO work. |
| listContains(?l, ?x)

listNotContains(?l, ?x) | Passes if ?l is a list which contains (does not contain) the element ?x,
both arguments must be ground, can not be used as a generator. |
| listEntry(?list, ?index, ?val) | Binds ?val to the ?index’th entry in the RDF list ?list. If there is no
such entry the variable will be unbound and the call will fail. Only
usable in rule bodies. |
| listLength(?l, ?len) | Binds ?len to the length of the list ?l. |
| listEqual(?la, ?lb)

listNotEqual(?la, ?lb) | listEqual tests if the two arguments are both lists and contain the same
elements. The equality test is semantic equality on literals (sameValueAs)
but will not take into account owl:sameAs aliases. listNotEqual is the
negation of this (passes if listEqual fails). |
| listMapAsObject(?s, ?p ?l)

listMapAsSubject(?l, ?p, ?o) | These can only be used as actions in the head of a rule. They deduce a
set of triples derived from the list argument ?l : listMapAsObject asserts
triples (?s ?p ?x) for each ?x in the list ?l, listMapAsSubject asserts
triples (?x ?p ?o). |
| table(?p) tableAll() | Declare that all goals involving property ?p (or all goals) should be tabled
by the backward engine. |
| hide(p) | Declares that statements involving the predicate p should be hidden. Queries
to the model will not report such statements. This is useful to enable
non-monotonic forward rules to define flag predicates which are only
used for inference control and do not “pollute” the inference results. |

builtin 自定义

自定义很简单，实现Builtin接口, 然后使用BuiltinRegistry.theRegistry.register注册即可。

Builtin接口定义如下：

public interface Builtin {

    /**
     * Return a convenient name for this builtin, normally this will be the name of the 
     * functor that will be used to invoke it and will often be the final component of the
     * URI.
     */
    public String getName();
    
    /**
     * Return the full URI which identifies this built in.
     */
    public String getURI();
    
    /**
     * Return the expected number of arguments for this functor or 0 if the number is flexible.
     */
    public int getArgLength();
    
    /**
     * This method is invoked when the builtin is called in a rule body.
     * @param args the array of argument values for the builtin, this is an array 
     * of Nodes, some of which may be Node_RuleVariables.
     * @param length the length of the argument list, may be less than the length of the args array
     * for some rule engines
     * @param context an execution context giving access to other relevant data
     * @return return true if the buildin predicate is deemed to have succeeded in
     * the current environment
     */
    public boolean bodyCall(Node[] args, int length, RuleContext context);
    
    /**
     * This method is invoked when the builtin is called in a rule head.
     * Such a use is only valid in a forward rule.
     * @param args the array of argument values for the builtin, this is an array 
     * of Nodes.
     * @param length the length of the argument list, may be less than the length of the args array
     * for some rule engines
     * @param context an execution context giving access to other relevant data
     */
    public void headAction(Node[] args, int length, RuleContext context);
    
    /**
     * Returns false if this builtin has side effects when run in a body clause,
     * other than the binding of environment variables.
     */
    public boolean isSafe();
    
    /**
     * Returns false if this builtin is non-monotonic. This includes non-monotonic checks like noValue
     * and non-monotonic actions like remove/drop. A non-monotonic call in a head is assumed to 
     * be an action and makes the overall rule and ruleset non-monotonic. 
     * Most JenaRules are monotonic deductive closure rules in which this should be false.
     */
    public boolean isMonotonic();
}

一般我们不用直接实现该接口，可以继承默认的实现BaseBuiltin, 一般只需要Override 下getName提供指令名称，实现bodyCall,提供函数调用即可。

    @Override
    public String getName() {
        return "semsim";
    }

比如，我们来自定义一个指令，用来计算两两语义相似度：

public class SemanticSimilarityBuiltin extends BaseBuiltin {
    /**
     * Return a convenient name for this builtin, normally this will be the name of the
     * functor that will be used to invoke it and will often be the final component of the
     * URI.
     */
    @Override
    public String getName() {
        return "semsim";
    }

    @Override
    public int getArgLength() {
        return 3;
    }


    /**
     * This method is invoked when the builtin is called in a rule body.
     *
     * @param args    the array of argument values for the builtin, this is an array
     *                of Nodes, some of which may be Node_RuleVariables.
     * @param context an execution context giving access to other relevant data
     * @return return true if the buildin predicate is deemed to have succeeded in
     * the current environment
     */
    @Override
    public boolean bodyCall(Node[] args, int length, RuleContext context) {
        checkArgs(length, context);
        Node n1 = getArg(0, args, context);
        Node n2 = getArg(1, args, context);
        Node score = getArg(2,args,context);

        if(!score.isLiteral()  || score.getLiteral().getValue()==null){
         return false;
        }
        String value;
        Double hold = Double.parseDouble(score.getLiteralValue().toString());

        //  n.isLiteral() && n.getLiteralValue() instanceof Number

        if (n1.isLiteral() && n2.isLiteral()) {
            String v1 = n1.getLiteralValue().toString();
            String v2 = n2.getLiteralValue().toString();

            // 调用服务计算相似度
            String requestUrl = "http://API-URL:5101/similarity/cosine?s1="+v1+"&s2="+v2;
            String result = HttpClientUtil.doGet(requestUrl);
            JSONObject json = JSON.parseObject(result);
            if(json.getDouble("similarity") >= hold){
                return true;
            }

            return true;
        }
        return false;
    }
}

这里有个getArgLength和checkArgs(length, context)，可以用来限制参数长度，检验必须符合该长度。
可以通过getArg(idx, args, context)来获取待计算的参数
上面的计算相似度，主要是调用外度的服务来计算两两的语义向量的cosine得分，如果满足阈值，我们就认为规则匹配

测试

我们来测试上面的定义的计算语义相似度的指令semsim，还是用第2篇里的例子：

我们新增加两个属性主要业务和竞争对手，我们定义，如果两个公司的主要业务语义上相似，我们就认为两家公司是竞争对手。

        Property 主要业务 = myMod.createProperty(finance + "主要业务");
        Property 竞争对手 = myMod.createProperty(finance + "竞争对手");

        // 加入三元组
      
        myMod.add(万达集团, 主要业务, "房地产，文娱");
        myMod.add(融创中国, 主要业务, "房地产");

然后定义规则：

[ruleCompetitor: (?c1 :主要业务 ?b1) (?c2 :主要业务 ?b2) notEqual(?c1,?c2) semsim(?b1,?b2,0.6)  -> (?c1 :竞争对手 ?c2)]

规则意思是，公司C1 主要业务是 b1,c2 主要业务是b2,并且c1和c2不是同一家公司，如果b1，b2的相似度大于0.6，那么C1和c2是竞争对手。

完整测试代码：

       // 注册自定义builtin
        BuiltinRegistry.theRegistry.register(new SemanticSimilarityBuiltin());

        Model myMod = ModelFactory.createDefaultModel();
        String finance = "http://www.example.org/kse/finance#";
        Resource 孙宏斌 = myMod.createResource(finance + "孙宏斌");
        Resource 融创中国 = myMod.createResource(finance + "融创中国");
        Resource 乐视网 = myMod.createResource(finance + "乐视网");
        Property 执掌 = myMod.createProperty(finance + "执掌");
        Resource 贾跃亭 = myMod.createResource(finance + "贾跃亭");
        Resource 地产公司 = myMod.createResource(finance + "地产公司");
        Resource 公司 = myMod.createResource(finance + "公司");
        Resource 法人实体 = myMod.createResource(finance + "法人实体");
        Resource 人 = myMod.createResource(finance + "人");
        Property 主要收入 = myMod.createProperty(finance + "主要收入");
        Resource 地产事业 = myMod.createResource(finance + "地产事业");
        Resource 王健林 = myMod.createResource(finance + "王健林");
        Resource 万达集团 = myMod.createResource(finance + "万达集团");
        Property 主要资产 = myMod.createProperty(finance + "主要资产");


        Property 股东 = myMod.createProperty(finance + "股东");
        Property 关联交易 = myMod.createProperty(finance + "关联交易");
        Property 收购 = myMod.createProperty(finance + "收购");

        Property 主要业务 = myMod.createProperty(finance + "主要业务");
        Property 竞争对手 = myMod.createProperty(finance + "竞争对手");

        // 加入三元组
        myMod.add(孙宏斌, 执掌, 融创中国);
        myMod.add(贾跃亭, 执掌, 乐视网);
        myMod.add(王健林, 执掌, 万达集团);
        myMod.add(乐视网, RDF.type, 公司);
        myMod.add(万达集团, RDF.type, 公司);
        myMod.add(融创中国, RDF.type, 地产公司);
        myMod.add(地产公司, RDFS.subClassOf, 公司);
        myMod.add(公司, RDFS.subClassOf, 法人实体);
        myMod.add(孙宏斌, RDF.type, 人);
        myMod.add(贾跃亭, RDF.type, 人);
        myMod.add(王健林, RDF.type, 人);
        myMod.add(万达集团, 主要资产, 地产事业);
        myMod.add(万达集团, 主要业务, "房地产，文娱");
        myMod.add(融创中国, 主要收入, 地产事业);
        myMod.add(融创中国, 主要业务, "房地产");
        myMod.add(孙宏斌, 股东, 乐视网);
        myMod.add(孙宏斌, 收购, 万达集团);

        PrintUtil.registerPrefix("", finance);

        // 输出当前模型
        StmtIterator i = myMod.listStatements(null, null, (RDFNode) null);
        while (i.hasNext()) {
            System.out.println(" - " + PrintUtil.print(i.nextStatement()));
        }


        GenericRuleReasoner reasoner = (GenericRuleReasoner) GenericRuleReasonerFactory.theInstance().create(null);
        reasoner.setRules(Rule.parseRules(
            "[ruleHoldShare: (?p :执掌 ?c) -> (?p :股东 ?c)] \n"
                + "[ruleConnTrans: (?p :收购 ?c) -> (?p :股东 ?c)] \n"
                + "[ruleConnTrans: (?p :股东 ?c) (?p :股东 ?c2) -> (?c :关联交易 ?c2)] \n"
                + "[ruleCompetitor:: (?c1 :主要业务 ?b1) (?c2 :主要业务 ?b2) notEqual(?c1,?c2) semsim(?b1,?b2,0.6)  -> (?c1 :竞争对手 ?c2)] \n"
                + "-> tableAll()."));
        reasoner.setMode(GenericRuleReasoner.HYBRID);

        InfGraph infgraph = reasoner.bind(myMod.getGraph());
        infgraph.setDerivationLogging(true);

        System.out.println("推理后...\n");

        Iterator<Triple> tripleIterator = infgraph.find(null, null, null);
        while (tripleIterator.hasNext()) {
            System.out.println(" - " + PrintUtil.print(tripleIterator.next()));
        }

运行结果：

 - (:万达集团 :关联交易 :乐视网)
 - (:万达集团 :关联交易 :融创中国)
 - (:万达集团 :竞争对手 :融创中国)
 - (:万达集团 :关联交易 :万达集团)
 - (:孙宏斌 :股东 :万达集团)
 - (:孙宏斌 :股东 :融创中国)
 - (:融创中国 :关联交易 :万达集团)
 - (:融创中国 :竞争对手 :万达集团)
 - (:融创中国 :关联交易 :乐视网)
 - (:融创中国 :关联交易 :融创中国)
 - (:乐视网 :关联交易 :万达集团)
 - (:乐视网 :关联交易 :融创中国)
 - (:乐视网 :关联交易 :乐视网)
 - (:贾跃亭 :股东 :乐视网)
 - (:王健林 :股东 :万达集团)
 - (:公司 rdfs:subClassOf :法人实体)
 - (:万达集团 :主要业务 '房地产，文娱')
 - (:万达集团 :主要资产 :地产事业)
 - (:万达集团 rdf:type :公司)
 - (:地产公司 rdfs:subClassOf :公司)
 - (:融创中国 :主要业务 '房地产')
 - (:融创中国 :主要收入 :地产事业)
 - (:融创中国 rdf:type :地产公司)
 - (:孙宏斌 :收购 :万达集团)
 - (:孙宏斌 :股东 :乐视网)
 - (:孙宏斌 rdf:type :人)
 - (:孙宏斌 :执掌 :融创中国)
 - (:乐视网 rdf:type :公司)
 - (:贾跃亭 rdf:type :人)
 - (:贾跃亭 :执掌 :乐视网)
 - (:王健林 rdf:type :人)
 - (:王健林 :执掌 :万达集团)

可以根据需要，扩展更多的builtin，比如运行js，比如http请求。。。

conda docker镜像

发表于 2019-09-10 更新于 2021-07-12 分类于博客， jqpeng
本文字数： 1.7k 阅读时长 ≈ 2 分钟

文章作者:jqpeng
原文链接: conda docker镜像

之前的python环境，使用ubuntu安装pip来安装python依赖，但是遇到缺少某些库的版本，比如一个项目需要用到faiss，pip只有最新的1.5.3版本，但是这个版本使用了较新的CPU指令，在老服务器上运行报错：

Illegal instruction (core dumped) - in new version of FAISS #885

github上提示安装旧版本：

遗憾的是，下面的命令不成功，没有1.5.1版本：

pip install faiss-cpu==1.5.1

转而投向conda。

首先，下载最新的conda安装命令：

wget https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh

然后构建conda的基础镜像，还是以ubuntu:16.04为底包，Dockerfile如下：

from ubuntu:16.04
RUN apt-get update && apt-get install -y --no-install-recommends \
      bzip2 \
      g++ \
      git \
      graphviz \
      libgl1-mesa-glx \
      libhdf5-dev \
      openmpi-bin \
      wget && \
    rm -rf /var/lib/apt/lists/*

RUN sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list
RUN apt-get update

ADD ./Anaconda3-2019.07-Linux-x86_64.sh ./anaconda.sh

ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN  /bin/bash ./anaconda.sh -b -p /opt/conda  && rm ./anaconda.sh && ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh  && echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && echo "conda activate base" >> ~/.bashrc && find /opt/conda/ -follow -type f -name '*.a' -delete && find /opt/conda/ -follow -type f -name '*.js.map' -delete &&  /opt/conda/bin/conda clean -afy


CMD [ "/bin/bash" ]

构建：

docker build -t conda3:1.0 .

后面，就可以以conda3:1.0 .为基础镜像构建需要的镜像，比如我们需要安装faiss-cpu 1.5.1版本

from conda3:1.0

RUN conda install pytorch -y
RUN conda install faiss-cpu=1.5.1 -c pytorch -y


CMD [ "/bin/bash" ]

构建：

docker build -t conda-faiss:1.0 .

知识图谱推理与实践 (2) -- 基于jena实现规则推理

发表于 2019-09-06 更新于 2021-07-12 分类于博客， jqpeng
本文字数： 7.7k 阅读时长 ≈ 7 分钟

文章作者:jqpeng
原文链接: 知识图谱推理与实践 (2) – 基于jena实现规则推理

本章，介绍基于jena的规则引擎实现推理，并通过两个例子介绍如何coding实现。

规则引擎概述

jena包含了一个通用的规则推理机，可以在RDFS和OWL推理机使用，也可以单独使用。

推理机支持在RDF图上推理，提供前向链、后向链和二者混合执行模式。包含RETE engine 和 one tabled datalog engine。可以通过GenericRuleReasoner来进行配置参数，使用各种推理引擎。要使用 GenericRuleReasoner，需要一个规则集来定义其行为.

Rule的语法与结构

规则通过 Rule对象来进行定义，包含 body terms列表 (premises),head terms列表 (conclusions) 和可选的 name 和可选的direction。

An informal description of the simplified text rule syntax is:

_Rule_      :=   _bare-rule_ .
          or   [ _bare-rule_ ]
       or   [ ruleName : _bare-rule_ ]

_bare-rule_ :=   _term_, ... _term_ -> _hterm_, ... _hterm_    // forward rule
          or   _bhterm_ <- _term_, ... _term   _ // backward rule

_hterm     :=   term
_ or   [ _bare-rule_ ]

_term_      :=   (_node_, _node_, _node_)           // triple pattern
          or   (_node_, _node_, _functor_)        // extended triple pattern
          or   builtin(_node_, ... _node_)      // invoke procedural primitive

_bhterm_      :=   (_node_, _node_, _node_)           // triple pattern

_functor_   :=   functorName(_node_, ... _node_)  // structured literal

_node_      :=   _uri-ref_                   // e.g. http://foo.com/eg
          or   prefix:localname          // e.g. rdf:type
          or   <_uri-ref_>          // e.g. <myscheme:myuri>
          or   ?_varname_ // variable
          or   'a literal'                 // a plain string literal
          or   'lex'^^typeURI              // a typed literal, xsd:* type names supported
          or   number                      // e.g. 42 or 25.5

逗号 “,” 分隔符是可选的.

前向和后向规则语法之间的区别仅与混合执行策略相关，请参见下文。

_functor_ 是一个扩展的三元组，用于创建和访问文本值。functorName可以是任何简单的标识符。

为保障rules的可读性URI引用支持qname语法。可以使用在 PrintUtil对象中注册的前缀。

下面是一些规则示例：

[allID: (?C rdf:type owl:Restriction), (?C owl:onProperty ?P),
     (?C owl:allValuesFrom ?D)  -> (?C owl:equivalentClass all(?P, ?D)) ]

[all2: (?C rdfs:subClassOf all(?P, ?D)) -> print('Rule for ', ?C)
    [all1b: (?Y rdf:type ?D) <- (?X ?P ?Y), (?X rdf:type ?C) ] ]

[max1: (?A rdf:type max(?P, 1)), (?A ?P ?B), (?A ?P ?C)
      -> (?B owl:sameAs ?C) ]

Rule allID说明了functor用于将OWL限制的组件收集到单个数据结构中，然后可以触发进一步的规则
Rule all2 表示一个前向规则，它创建了一个新的后向规则，并且还调用了print.
Rule max1 说明了如何使用数字

可以使用以下方法加载和解析规则文件：

List rules = Rule.rulesFromURL("file:myfile.rules");

或者

BufferedReader br = / _open reader_ / ;
List rules = Rule.parseRules( Rule.rulesParserFromReader(br) );

或者

String ruleSrc = / _list of rules in line_ /
List rules = Rule.parseRules( rulesSrc );

在前两种情况下（从URL或BufferedReader读取），规则文件由一个简单的处理器预处理，该处理器剥离注释并支持一些额外的宏命令：

# ...: 注释.
// ...: 注释
@prefix pre: <http://domain/url#>.: 定义了一个前缀pre ，可以用在规则文件中.
@include <urlToRuleFile>.: 包含指定规则,允许规则文件包含RDFS和OWL的预定义规则

完整实例：

 @prefix pre: <http://jena.hpl.hp.com/prefix#>. 
 @include <RDFS>. 
 [rule1: (?f pre:father ?a) (?u pre:brother ?f) -> (?u pre:uncle ?a)]

规则推理demo1–喜剧演员

例如，在一个电影知识图谱里，如果一个演员参演的电影的类型是喜剧片，我们可以认为这个演员是喜剧电影

推理规则：

[ruleComedian: (?p :hasActedIn ?m) (?m :hasGenre ?g) (?g :genreName '喜剧') -> (?p rdf:type :Comedian)]

我们用代码来实现：

         String prefix = "http://www.test.com/kg/#";
        Graph data = Factory.createGraphMem();

        // 定义节点
        Node movie = NodeFactory.createURI(prefix + "movie");
        Node hasActedIn = NodeFactory.createURI(prefix + "hasActedIn");
        Node hasGenre = NodeFactory.createURI(prefix + "hasGenre");
        Node genreName = NodeFactory.createURI(prefix + "genreName");
        Node genre = NodeFactory.createURI(prefix + "genre");
        Node person = NodeFactory.createURI(prefix + "person");
        Node Comedian = NodeFactory.createURI(prefix + "Comedian");

        // 添加三元组
        data.add(new Triple(genre, genreName, NodeFactory.createLiteral("喜剧")));
        data.add(new Triple(movie, hasGenre, genre));
        data.add(new Triple(person, hasActedIn, movie));        // 创建推理机
        GenericRuleReasoner reasoner = (GenericRuleReasoner) GenericRuleReasonerFactory.theInstance().create(null);
        PrintUtil.registerPrefix("", prefix);    // 设置规则
        reasoner.setRules(Rule.parseRules(
                "[ruleComedian: (?p :hasActedIn ?m) (?m :hasGenre ?g) (?g :genreName '喜剧') -> (?p rdf:type :Comedian)] \n"
                        + "-> tableAll()."));
        reasoner.setMode(GenericRuleReasoner.HYBRID); // HYBRID混合推理

        InfGraph infgraph = reasoner.bind(data);
        infgraph.setDerivationLogging(true);        // 执行推理
        Iterator<Triple> tripleIterator = infgraph.find(person, null, null);

        while (tripleIterator.hasNext()) {
            System.out.println(PrintUtil.print(tripleIterator.next()));
        }

输出结果：

(:person rdf:type :Comedian)
(:person :hasActedIn :movie)

可以看到，已经给person加上了Comedian。

规则推理demo2 – 关联交易

我们再来看上一篇文章中提到的那个金融图谱：

陈华钧老师PPT里，有一个推理任务：

执掌一家公司就一定是这家公司的股东；
某人同时是两家公司的股东，那么这两家公司一定有关联交易；

PPT里是使用Drools来实现的，具体可以参见PPT。我们这里使用jena来实现，可以达到同样的效果。

首先，构造好图谱，为了方便理解，我们用中文变量：

Model myMod = ModelFactory.createDefaultModel();
        String finance = "http://www.example.org/kse/finance#";
        Resource 孙宏斌 = myMod.createResource(finance + "孙宏斌");
        Resource 融创中国 = myMod.createResource(finance + "融创中国");
        Resource 乐视网 = myMod.createResource(finance + "乐视网");
        Property 执掌 = myMod.createProperty(finance + "执掌");
        Resource 贾跃亭 = myMod.createResource(finance + "贾跃亭");
        Resource 地产公司 = myMod.createResource(finance + "地产公司");
        Resource 公司 = myMod.createResource(finance + "公司");
        Resource 法人实体 = myMod.createResource(finance + "法人实体");
        Resource 人 = myMod.createResource(finance + "人");
        Property 主要收入 = myMod.createProperty(finance + "主要收入");
        Resource 地产事业 = myMod.createResource(finance + "地产事业");
        Resource 王健林 = myMod.createResource(finance + "王健林");
        Resource 万达集团 = myMod.createResource(finance + "万达集团");
        Property 主要资产 = myMod.createProperty(finance + "主要资产");


        Property 股东 = myMod.createProperty(finance + "股东");
        Property 关联交易 = myMod.createProperty(finance + "关联交易");
        Property 收购 = myMod.createProperty(finance + "收购");

        // 加入三元组
        myMod.add(孙宏斌, 执掌, 融创中国);
        myMod.add(贾跃亭, 执掌, 乐视网);
        myMod.add(王健林, 执掌, 万达集团);
        myMod.add(乐视网, RDF.type, 公司);
        myMod.add(万达集团, RDF.type, 公司);
        myMod.add(融创中国, RDF.type, 地产公司);
        myMod.add(地产公司, RDFS.subClassOf, 公司);
        myMod.add(公司, RDFS.subClassOf, 法人实体);
        myMod.add(孙宏斌, RDF.type, 人);
        myMod.add(贾跃亭, RDF.type, 人);
        myMod.add(王健林, RDF.type, 人);
        myMod.add(万达集团,主要资产,地产事业);
        myMod.add(融创中国,主要收入,地产事业);
        myMod.add(孙宏斌, 股东, 乐视网);
        myMod.add(孙宏斌, 收购, 万达集团);

        PrintUtil.registerPrefix("", finance);

        // 输出当前模型
        StmtIterator i = myMod.listStatements(null,null,(RDFNode)null);
        while (i.hasNext()) {
            System.out.println(" - " + PrintUtil.print(i.nextStatement()));
        }

上图所示的图谱，包含如下的三元组：

 - (:公司 rdfs:subClassOf :法人实体)
 - (:万达集团 :主要资产 :地产事业)
 - (:万达集团 rdf:type :公司)
 - (:地产公司 rdfs:subClassOf :公司)
 - (:融创中国 :主要收入 :地产事业)
 - (:融创中国 rdf:type :地产公司)
 - (:孙宏斌 :股东 :乐视网)
 - (:孙宏斌 rdf:type :人)
 - (:孙宏斌 :执掌 :融创中国)
 - (:乐视网 rdf:type :公司)
 - (:贾跃亭 rdf:type :人)
 - (:贾跃亭 :执掌 :乐视网)
 - (:王健林 rdf:type :人)
 - (:王健林 :执掌 :万达集团)

我们来定义推理规则：

执掌一家公司就一定是这家公司的股东；
收购一家公司，就是这家公司的股东
某人同时是两家公司的股东，那么这两家公司一定有关联交易；

用jena规则来表示：

[ruleHoldShare: (?p :执掌 ?c) -> (?p :股东 ?c)] 
[[ruleHoldShare2: (?p :收购 ?c) -> (?p :股东 ?c)] 
[ruleConnTrans: (?p :股东 ?c) (?p :股东 ?c2) -> (?c :关联交易 ?c2)]

执行推理：

         GenericRuleReasoner reasoner = (GenericRuleReasoner) GenericRuleReasonerFactory.theInstance().create(null);
        reasoner.setRules(Rule.parseRules(
                "[ruleHoldShare: (?p :执掌 ?c) -> (?p :股东 ?c)] \n"
                        + "[ruleConnTrans: (?p :收购 ?c) -> (?p :股东 ?c)] \n"
                        + "[ruleConnTrans: (?p :股东 ?c) (?p :股东 ?c2) -> (?c :关联交易 ?c2)] \n"
                        + "-> tableAll()."));
        reasoner.setMode(GenericRuleReasoner.HYBRID);

        InfGraph infgraph = reasoner.bind(myMod.getGraph());
        infgraph.setDerivationLogging(true);

        System.out.println("推理后...\n");

        Iterator<Triple> tripleIterator = infgraph.find(null, null, null);
        while (tripleIterator.hasNext()) {
            System.out.println(" - " + PrintUtil.print(tripleIterator.next()));
        }

输出结果：

推理后...

 - (:万达集团 :关联交易 :乐视网)
 - (:万达集团 :关联交易 :融创中国)
 - (:万达集团 :关联交易 :万达集团)
 - (:孙宏斌 :股东 :万达集团)
 - (:孙宏斌 :股东 :融创中国)
 - (:融创中国 :关联交易 :万达集团)
 - (:融创中国 :关联交易 :乐视网)
 - (:融创中国 :关联交易 :融创中国)
 - (:乐视网 :关联交易 :万达集团)
 - (:乐视网 :关联交易 :融创中国)
 - (:乐视网 :关联交易 :乐视网)
 - (:贾跃亭 :股东 :乐视网)
 - (:王健林 :股东 :万达集团)
 - (:公司 rdfs:subClassOf :法人实体)
 - (:万达集团 :主要资产 :地产事业)
 - (:万达集团 rdf:type :公司)
 - (:地产公司 rdfs:subClassOf :公司)
 - (:融创中国 :主要收入 :地产事业)
 - (:融创中国 rdf:type :地产公司)
 - (:孙宏斌 :收购 :万达集团)
 - (:孙宏斌 :股东 :乐视网)
 - (:孙宏斌 rdf:type :人)
 - (:孙宏斌 :执掌 :融创中国)
 - (:乐视网 rdf:type :公司)
 - (:贾跃亭 rdf:type :人)
 - (:贾跃亭 :执掌 :乐视网)
 - (:王健林 rdf:type :人)
 - (:王健林 :执掌 :万达集团)

我们看到，推理后孙宏斌是三家公司的股东，三家公司都有关联交易。

知识图谱推理与实践（1）

发表于 2019-09-05 更新于 2021-07-12 分类于博客， jqpeng
本文字数： 5.5k 阅读时长 ≈ 5 分钟

文章作者:jqpeng
原文链接: 知识图谱推理与实践（1）

由于工作原因，需要在系统里建立图谱推理功能，因此简单学习了浙江大学陈华钧教授知识图谱导论课程课件，这里记录下学习笔记。

知识图谱推理的主要方法

• 基于描述逻辑的推理（如DL-based）
• 基于图结构和统计规则挖掘的推理（如： PRA、 AMIE）
• 基于知识图谱表⽰学习的推理（如： TransE）
• 基于概率逻辑的⽅法（如： Statistical Relational Learning）

基于符号逻辑的推理——本体推理

传统的符号逻辑推理中主要与知识图谱有关的推理手段是基于描述逻辑的本体推理。
描述逻辑主要被⽤来对事物的本体进⾏建模和推理，⽤来描述和推断概念分类及其概念之间的关系。
主要方法：
- 基于表运算（Tableaux）及改进的⽅法： FaCT++、 Racer、 Pellet Hermit等
- 基于Datalog转换的⽅法如KAON、 RDFox等
- 基于产⽣式规则的算法（如rete）： Jena 、 Sesame、 OWLIM等

基于图结构和统计规则挖掘的推理

主要方法：
• 基于路径排序学习⽅法(PRA， Path ranking Algorithm)
• 基于关联规则挖掘⽅法(AMIE)

基于知识图谱表示学习的关系推理

将实体和关系都表示为向量
通过向量之间的计算代替图的遍历和搜索来预测三元组的存在，由于向量的表示已经包含了实体原有的语义信息，计算含有⼀定的推理能⼒。
可应⽤于链接预测，基于路径的多度查询等

基于概率逻辑的⽅法——Statistical Relational Learning

概率逻辑学习有时也叫Relational Machine Learning (RML)，关注关系的不确定性和复杂性。
通常使用Bayesian networks or Markov networks

基于符号逻辑的推理

本体概念推理

图谱中基于RDF来作为资源描述语言，RDF是Resource Description Framework的简称。

但是RDF表示关系层次受限，因此有了RDFS,在RDF的基础上，新增了Class, subClassOf, type, Property, subPropertyOf, Domain, Range 词汇，可以更好的表述相关关系。

基于RDFS，可以做一些简单的推理

OWL在RDFS的基础上，进一步扩展了一些复杂类型、约束：

因此，我们也叫OWL为本体语言：

OWL是知识图谱语言中最规范，最严谨，表达能力最强的语言
基于RDF语法，使表示出来的文档具有语义理解的结构基础
促进了统一词汇表的使用，定义了丰富的语义词汇
允许逻辑推理

OWL的描述逻辑系统：

一个描述逻辑系统包括四个基本的组成部分
- 1）最基本的元素：概念、关系和个体（实例），
- 1. TBox术语集 (概念术语的公理集合) - 泛化的知识

    - 描述概念和关系的知识，被称之为公理 (Axiom)
- 1. **ABox断言集** (个体的断言集合)  --具体个体的信息


    - ABox包含外延知识 (又称断言 (Assertion))，描述论域中

的特定个体
- 1. TBox和ABox上的推理机制

不同的描述逻辑系统的表示能力与推理机制由于对这四个组成部分的不同选择而不同

描述逻辑与OWL的对应：

推理就是通过各种方法获取新的知识或者结论，这些知识和结论满足语义。

OWL本体推理

可满足性
- 本体可满足性：检查一个本体是否可满足，即检查该本体是否有模型。
- 概念可满足性，检查某一概念的可满足性，即检查是否有模型，使得对该概念的解释不是空集。

分类(classification)，针对Tbox的推理，计算新的概念的包含关系

实例化（materialization）,即计算属于某个概念或关系的所有实例的集合。

例子：

典型的推理算法： Tableaux，适用于检查某一本体概念的可满足性，以及实例检测，基本思想是通过一系列规则构建Abox，以检测可满足性，或者检测某一实例是否存在于某概念，基本思想类似于一阶逻辑的归结反驳。

基于逻辑编程改写的方法

本体推理的局限:

(1) 仅支持预定义的本体公理上的推理 (无法针对自定义的词汇支持灵活推理)
(2) 用户无法定义自己的推理过程

因此，引入规则推理

(1) 可以根据特定的场景定制规则，以实现用户自定义的推理过程
(2) Datalog语言可以结合本体推理和规则推理

Datalog的语法：

原子（atom）
- p(t1,t2,…,tn)
- p是谓词，n是目数，ti是项
- 例如has_child(x,y)
规则（rule）
- H:-B1,B2,…,Bm
- has_child(X, Y) :−has_son(X, Y)
事实(Fact)
- F(c1,c2,…cn):-
- 没有体部且没有变量的规则
- 例如：has_child(Alice,Bob):-

Datalog程序是规则的集合：

has_child(X, Y) : −has_son(X, Y).
has_child(Alice, Bob) : −

Datalog 推理举例：

基于产生式规则的方法

产生式系统，一种前向推理系统，可以按照一定机制执行规则从而达到某些目标，与一阶逻辑类似，也有区别，可以应用来做自动规划和专家系统。

产生式系统的组成：

事实集合 (Working Memory)
产生式/规则集合 (Production Memory, PM)
推理引擎

产生式表示：

IF conditions THEN actions

conditions是由条件组成的集合，又称为LHS（Left Hand Side）
actions是由动作组成的序列，又称为RHS（Right Hand Side)

LHS，是条件的集合，各条件是且（AND）的关系，当所有条件均被满足，则该规则触发。
条件形如(type attr1: spec1 attr2:spec2)条件的形式：

原子 (person name:alice)
变量（person name:x)
表达式 (person age:[n+4]
布尔 (person age:{>10})
约束的与、或、非

RHS，是执行动作（action）的序列，执行时依次运行。动作的种类有ADD pattern，Remove i，Modify i，可以理解为对WME（Working Memory）的CUD；

产生式举例：

IF (Student name: x)
Then ADD (Person name: x)

也可以写作：

(Student name: x) ⇒ ADD (Person name: x)

推理引擎

➤ 控制系统的执行：

模式匹配，用规则的条件部分匹配事实集中的事实，整个LHS都被满足的规，则被触发，并被加入议程(agenda)
解决冲突，按一定的策略从被触发的多条规则中选择一条
执行动作，执行被选择出来的规则的RHS，从而对WM进行一定的操作

产生式系统执行流程

模式匹配——RETE算法

将产生式的LHS组织成判别网络形式
用空间换时间

Inductive Reasoning – 基于图的方法

PRA

➤ 将连接两个实体的路径作为特征来预测其间可能存在的关系

• 通用关系学习框架 (generic relational learning framework)

路径排序算法 – Path Ranking Algorithm (PRA)

TransE

知识图谱嵌⼊模型： TransE

TransE(Translating Embeddings for Modeling Multi-relational Data. NIPS 3013)

⽬标函数：

损失函数：

知识图谱嵌⼊模型：预测问题

测试三元组( h, r, t )
尾实体预测( h, r, ? )
头实体预测( ?, r, t )

PRA vs. TransE

基于Jena实现演绎推理

构建model

NO BB, show code：

Model myMod = ModelFactory.createDefaultModel();
String finance = “http://www.example.org/kse/finance#”;

// 实体
Resource shb = myMod.createResource(finance + "孙宏斌");
Resource rczg = myMod.createResource(finance + "融创中国");


// 关系

Property control = myMod.createProperty(finance + "执掌");

// 加入三元组
myMod.add(shb, control, rczg);

上图所示的图谱，包含如下的三元组：

finance :孙宏斌 finance :control finance :融创中国
finance :贾跃亭 finance :control finance :乐视网
finance :融创中国 rdf:type finance :地产公司
finance :地产公司 rdfs:subclassOf finance:公司
finance:公司 rdfs:subclassOf finance:法人实体
finance:孙宏斌 rdf:type finance:公司
finance:孙宏斌 rdf:type finance:人
finance :人 owl:disjointWith finance:公司

我们可以依次加入，代码略。

添加推理机

jena推理使用的是InfModel，可以基于Model构造，实际上在原来的Model之上加了个RDFS推理机

InfModel inf_rdfs = ModelFactory.createRDFSModel(myMod);

• 上下位推理

通过listStatements来获取是否有满足条件的三元组，从而实现判断，subClassOf是RDFS里的vob，因此使用RDFS.subClassOf。

public static void subClassOf(Model m, Resource s, Resource o) {
for (StmtIterator i = m.listStatements(s, RDFS.subClassOf, o); i.hasNext(); ) {
Statement stmt = i.nextStatement();
System.out.println(" yes! " );
break;
}
}

subClassOf(inf_rdfs, myMod.getResource(finance+"地产公司"),myMod.getResource(finance+”法人实体"));

• 针对类别的推理，OWL推理机可以针对个体类别做出完备推理，即补充完整该个体的所有类别；在查询的时候，可以直接打印出所有类别！

首先构建owl推理机：

Reasoner reasoner = ReasonerRegistry.getOWLReasoner();
InfModel inf_owl = ModelFactory.createInfModel(reasoner, myMod);

然后执行类别推理

public static void printStatements(Model m, Resource s, Property p, Resource o) {
for (StmtIterator i = m.listStatements(s,p,o); i.hasNext(); ) {
Statement stmt = i.nextStatement();
System.out.println(" - " + PrintUtil.print(stmt));
}
}
printStatements(inf_owl, rczg, RDF.type, null);

• 不一致检测, jena的另一个常用推理就是检验data的不一致。

Model data = FileManager.get().loadModel(fname);
Reasoner reasoner = ReasonerRegistry.getOWLReasoner();
InfModel inf_owl = ModelFactory.createInfModel(reasoner, myMod);
ValidityReport validity = inf_owl.validate();
if (validity.isValid()) {
System.out.println(“没有不一致");
} else {
System.out.println(“存在不一致，如下： ");
for (Iterator i = validity.getReports(); i.hasNext(); ) {
System.out.println(" - " + i.next());
}
}

Docker swarm 获取service的container信息

发表于 2019-09-03 更新于 2021-07-12 分类于博客， jqpeng
本文字数： 22k 阅读时长 ≈ 20 分钟

文章作者:jqpeng
原文链接: Docker swarm 获取service的container信息

我们可以通过docker service create创建服务，例如：

docker service create --name mysql mysql:latest

服务创建好后，如何来获取该service包含的容器信息呢？比如获取刚才创建的mysql服务的容器。我们可以通过docker service ps命令来获取，

命令行方式

~# docker service ps mysql
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE        ERROR               PORTS
lvskmv1lkhz6        mysql.1             mysql:latest        docker86-9          Running             Running 3 days ago

遗憾的是返回的数据不包含containerId，只有serviceId, 可以通过docker inspect来获取service详情

~# docker inspect lvskmv1lkhz6
[
    {
        "ID": "lvskmv1lkhz6bvynfuxa0jqgn",
        "Version": {
            "Index": 21
        },
        "CreatedAt": "2019-08-30T08:04:18.382831966Z",
        "UpdatedAt": "2019-08-30T08:09:43.613636037Z",
        "Labels": {},
        "Spec": {
            "ContainerSpec": {
                "Image": "mysql:latest@sha256:01cf53f2538aa805bda591d83f107c394adca8d31f98eacd3654e282dada3193",
                "Env": [
                    "MYSQL_ROOT_PASSWORD=aimind@mysql2019\""
                ],
                "Isolation": "default"
            },
            "Resources": {
                "Limits": {},
                "Reservations": {}
            },
            "RestartPolicy": {
                "Condition": "any",
                "Delay": 5000000000,
                "MaxAttempts": 0,
                "Window": 0
            },
            "Placement": {},
            "ForceUpdate": 0
        },
        "ServiceID": "uporil7xf4rwffa0rhg1j5htw",
        "Slot": 1,
        "NodeID": "sixp62dhqe702b69pm6v8m9rh",
        "Status": {
            "Timestamp": "2019-08-30T08:09:43.554514932Z",
            "State": "running",
            "Message": "started",
            "ContainerStatus": {
                "ContainerID": "2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08",
                "PID": 14884,
                "ExitCode": 0
            },
            "PortStatus": {}
        },
        "DesiredState": "running",
        "NetworksAttachments": [
            {
                "Network": {
                    "ID": "emypqxzjggws7uicersyz6uag",
                    "Version": {
                        "Index": 12
                    },
                    "CreatedAt": "2019-08-30T08:02:57.254494392Z",
                    "UpdatedAt": "2019-08-30T08:02:57.271216394Z",
                    "Spec": {
                        "Name": "aimind-overlay",
                        "Labels": {},
                        "DriverConfiguration": {
                            "Name": "overlay"
                        },
                        "IPAMOptions": {
                            "Driver": {
                                "Name": "default"
                            }
                        },
                        "Scope": "swarm"
                    },
                    "DriverState": {
                        "Name": "overlay",
                        "Options": {
                            "com.docker.network.driver.overlay.vxlanid_list": "4097"
                        }
                    },
                    "IPAMOptions": {
                        "Driver": {
                            "Name": "default"
                        },
                        "Configs": [
                            {
                                "Subnet": "10.0.0.0/24",
                                "Gateway": "10.0.0.1"
                            }
                        ]
                    }
                },
                "Addresses": [
                    "10.0.0.4/24"
                ]
            }
        ]
    }
]

返回的json中，NodeID是所在节点ID，Status.ContainerStatus 是容器的状态信息，.Status.ContainerStatus.ContainerID 是容器ID，比如这里的是2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08。

拿到容器ID就能获取容器详情了，也可以获取container的统计信息：

docker inspect 2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08
[
    {
        "Id": "2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08",
        "Created": "2019-08-30T08:09:41.827551223Z",
        "Path": "docker-entrypoint.sh",
        "Args": [
            "mysqld"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 14884,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2019-08-30T08:09:43.402630785Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:62a9f311b99c24c0fde0a772abc6030bc48e5acc7d7416b8eeb72d3da1b4eb6c",
        "ResolvConfPath": "/data/docker/containers/2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08/resolv.conf",
        "HostnamePath": "/data/docker/containers/2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08/hostname",
        "HostsPath": "/data/docker/containers/2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08/hosts",
        "LogPath": "/data/docker/containers/2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08/2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08-json.log",
        "Name": "/mysql.1.lvskmv1lkhz6bvynfuxa0jqgn",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": null,
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {
                    "max-file": "3",
                    "max-size": "10m"
                }
            },
            "NetworkMode": "default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "shareable",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "default",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/asound",
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/data/docker/overlay2/f0184a2c979eef7a135726a49f5651e16b568ecfd47606e20e504e28ea311f25-init/diff:/data/docker/overlay2/644c4c905af78d3320559b9f388631151dcf5c19ab8f2c91999d4d59c8409784/diff:/data/docker/overlay2/7ed834798bd5eeef1b75d012a27bb01cd8a0a5e71048db72a8743980481bb74b/diff:/data/docker/overlay2/56e3eac1c86a9ae29b3251025824f93b78e43151a36eb973407feb1075d8db1c/diff:/data/docker/overlay2/40161cfa334a118eaa09c04dc7d864d00e3544f77e6979584298478f68566bc5/diff:/data/docker/overlay2/e884a3df3e827368a468a4afc8850de4fa6336a78ca9a922406237e3ab75a97e/diff:/data/docker/overlay2/a04e8776674f902eaa0e15467ad0678f03baf2a1b8a568b034ad4b4c1ddb1a23/diff:/data/docker/overlay2/7745739e901232d6b702b599844157583d02a34fa4aca10c888e0e9c44075433/diff:/data/docker/overlay2/f423b8f55475ec902cea1ea5c54897ed6a24da3cc0acd64a79e022e887d83e77/diff:/data/docker/overlay2/231e63e7fbb5084facc93c89ed23d366d915f9a2edd4f85735df5d45bc87cafa/diff:/data/docker/overlay2/c11047327e6f47e49d1abee4df8acbaba51ac6b92e59801ac613331c5bad3bc1/diff:/data/docker/overlay2/f893602043c1b5ad9d2839ec0ab8f17da7e0eaf073788f6c3d35138dfe6c06b8/diff:/data/docker/overlay2/3443517fc9e882df67d9730a9aa7530dc3c541b6872aaf05290c5e7ec588e0fb/diff",
                "MergedDir": "/data/docker/overlay2/f0184a2c979eef7a135726a49f5651e16b568ecfd47606e20e504e28ea311f25/merged",
                "UpperDir": "/data/docker/overlay2/f0184a2c979eef7a135726a49f5651e16b568ecfd47606e20e504e28ea311f25/diff",
                "WorkDir": "/data/docker/overlay2/f0184a2c979eef7a135726a49f5651e16b568ecfd47606e20e504e28ea311f25/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "volume",
                "Name": "c2128d05001b8fec1712807f381e2c72d42ce8a83ae97f6b038f51c0d48446f1",
                "Source": "/data/docker/volumes/c2128d05001b8fec1712807f381e2c72d42ce8a83ae97f6b038f51c0d48446f1/_data",
                "Destination": "/var/lib/mysql",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "2cf128f77797",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "3306/tcp": {},
                "33060/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "MYSQL_ROOT_PASSWORD=aimind@mysql2019\"",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "GOSU_VERSION=1.7",
                "MYSQL_MAJOR=8.0",
                "MYSQL_VERSION=8.0.17-1debian9"
            ],
            "Cmd": [
                "mysqld"
            ],
            "ArgsEscaped": true,
            "Image": "mysql:latest@sha256:01cf53f2538aa805bda591d83f107c394adca8d31f98eacd3654e282dada3193",
            "Volumes": {
                "/var/lib/mysql": {}
            },
            "WorkingDir": "",
            "Entrypoint": [
                "docker-entrypoint.sh"
            ],
            "OnBuild": null,
            "Labels": {
                "com.docker.swarm.node.id": "sixp62dhqe702b69pm6v8m9rh",
                "com.docker.swarm.service.id": "uporil7xf4rwffa0rhg1j5htw",
                "com.docker.swarm.service.name": "mysql",
                "com.docker.swarm.task": "",
                "com.docker.swarm.task.id": "lvskmv1lkhz6bvynfuxa0jqgn",
                "com.docker.swarm.task.name": "mysql.1.lvskmv1lkhz6bvynfuxa0jqgn"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "459ab4b83580513da251182d08dc217d0079613d10952df00ffcca6e2537958b",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "3306/tcp": null,
                "33060/tcp": null
            },
            "SandboxKey": "/var/run/docker/netns/459ab4b83580",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "aimind-overlay": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.0.4"
                    },
                    "Links": null,
                    "Aliases": [
                        "2cf128f77797"
                    ],
                    "NetworkID": "emypqxzjggws7uicersyz6uag",
                    "EndpointID": "56a78b2527a6dcf83fd3dc2794c514aaa325457d9c8a21bd236d3ea3c22c8fa9",
                    "Gateway": "",
                    "IPAddress": "10.0.0.4",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:00:00:04",
                    "DriverOpts": null
                }
            }
        }
    }
]

然后就可以通过stats来获取资源占用情况：

~#docker stats 2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08 --all --no-stream 
CONTAINER ID        NAME                                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
2cf128f77797        mysql.1.lvskmv1lkhz6bvynfuxa0jqgn   0.33%               374.4MiB / 188.8GiB   0.19%               230kB / 0B          8.19kB / 1.26GB     38

coding方式

除了命令行，我们还可以通过docker api来获取，可以参见 docker-java Docker的java API

获取containerID

System.out.println(client.listTasksCmd().withNameFilter("mysql").exec());

结果：

[class Task {
    ID: lvskmv1lkhz6bvynfuxa0jqgn
    version: 21
    createdAt: 2019-08-30T08:04:18.382831966Z
    updatedAt: 2019-08-30T08:09:43.613636037Z
    name: null
    labels: {}
    spec: TaskSpec[containerSpec=ContainerSpec[image=mysql:latest@sha256:01cf53f2538aa805bda591d83f107c394adca8d31f98eacd3654e282dada3193,labels=<null>,command=<null>,args=<null>,env=[MYSQL_ROOT_PASSWORD=aimind@mysql2019"],dir=<null>,user=<null>,groups=<null>,tty=<null>,mounts=<null>,duration=<null>,stopGracePeriod=<null>,dnsConfig=<null>,openStdin=<null>,readOnly=<null>,hosts=<null>,hostname=<null>,secrets=<null>,healthCheck=<null>,stopSignal=<null>,privileges=<null>,configs=<null>],resources=ResourceRequirements[limits=ResourceSpecs[memoryBytes=<null>,nanoCPUs=<null>],reservations=ResourceSpecs[memoryBytes=<null>,nanoCPUs=<null>]],restartPolicy=ServiceRestartPolicy[condition=ANY,delay=5000000000,maxAttempts=0,window=0],placement=ServicePlacement[constraints=<null>,platforms=<null>],logDriver=<null>,forceUpdate=0,networks=<null>,runtime=<null>]
    serviceId: uporil7xf4rwffa0rhg1j5htw
    slot: 1
    nodeId: sixp62dhqe702b69pm6v8m9rh
    assignedGenericResources: null
    status: TaskStatus[timestamp=2019-08-30T08:09:43.554514932Z,state=running,message=started,err=<null>,containerStatus=TaskStatusContainerStatus[containerID=2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08,pid=14884,exitCode=0]]
    desiredState: running
}]

可以看到containerID：2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08 和命令行一直。

然后获取容器详情：

 System.out.println(client.inspectContainerCmd("2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08").exec());

获取容器统计信息：

        System.out.println(client.statsCmd("2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08").exec(new InvocationBuilder.AsyncResultCallback<>()).awaitResult());

对应的结果：

InspectContainerResponse[args={mysqld},config=com.github.dockerjava.api.model.ContainerConfig@3e15bb06[attachStderr=false,attachStdin=false,attachStdout=false,cmd={mysqld},domainName=,entrypoint={docker-entrypoint.sh},env={MYSQL_ROOT_PASSWORD=aimind@mysql2019",PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin,GOSU_VERSION=1.7,MYSQL_MAJOR=8.0,MYSQL_VERSION=8.0.17-1debian9},exposedPorts=com.github.dockerjava.api.model.ExposedPorts@6778aea6,hostName=2cf128f77797,image=mysql:latest@sha256:01cf53f2538aa805bda591d83f107c394adca8d31f98eacd3654e282dada3193,labels={com.docker.swarm.node.id=sixp62dhqe702b69pm6v8m9rh, com.docker.swarm.service.id=uporil7xf4rwffa0rhg1j5htw, com.docker.swarm.service.name=mysql, com.docker.swarm.task=, com.docker.swarm.task.id=lvskmv1lkhz6bvynfuxa0jqgn, com.docker.swarm.task.name=mysql.1.lvskmv1lkhz6bvynfuxa0jqgn},macAddress=<null>,networkDisabled=<null>,onBuild=<null>,stdinOpen=false,portSpecs=<null>,stdInOnce=false,tty=false,user=,volumes={/var/lib/mysql={}},workingDir=,healthCheck=<null>],created=2019-08-30T08:09:41.827551223Z,driver=overlay2,execDriver=<null>,hostConfig=com.github.dockerjava.api.model.HostConfig@5853495b[binds=<null>,blkioWeight=0,blkioWeightDevice=<null>,blkioDeviceReadBps=<null>,blkioDeviceWriteBps=<null>,blkioDeviceReadIOps=<null>,blkioDeviceWriteIOps=<null>,memorySwappiness=<null>,nanoCPUs=<null>,capAdd=<null>,capDrop=<null>,containerIDFile=,cpuPeriod=0,cpuRealtimePeriod=0,cpuRealtimeRuntime=0,cpuShares=0,cpuQuota=0,cpusetCpus=,cpusetMems=,devices=<null>,deviceCgroupRules=<null>,diskQuota=0,dns=<null>,dnsOptions=<null>,dnsSearch=<null>,extraHosts=<null>,groupAdd=<null>,ipcMode=shareable,cgroup=,links=<null>,logConfig=com.github.dockerjava.api.model.LogConfig@524a2ffb,lxcConf=<null>,memory=0,memorySwap=0,memoryReservation=0,kernelMemory=0,networkMode=default,oomKillDisable=false,init=<null>,autoRemove=false,oomScoreAdj=0,portBindings={},privileged=false,publishAllPorts=false,readonlyRootfs=false,restartPolicy=no,ulimits=<null>,cpuCount=0,cpuPercent=0,ioMaximumIOps=0,ioMaximumBandwidth=0,volumesFrom=<null>,mounts=<null>,pidMode=,isolation=default,securityOpts=<null>,storageOpt=<null>,cgroupParent=,volumeDriver=,shmSize=67108864,pidsLimit=0,runtime=runc,tmpFs=<null>,utSMode=,usernsMode=,sysctls=<null>,consoleSize=[0, 0]],hostnamePath=/data/docker/containers/2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08/hostname,hostsPath=/data/docker/containers/2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08/hosts,logPath=/data/docker/containers/2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08/2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08-json.log,id=2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08,sizeRootFs=<null>,imageId=sha256:62a9f311b99c24c0fde0a772abc6030bc48e5acc7d7416b8eeb72d3da1b4eb6c,mountLabel=,name=/mysql.1.lvskmv1lkhz6bvynfuxa0jqgn,restartCount=0,networkSettings=com.github.dockerjava.api.model.NetworkSettings@7173ae5b[bridge=,sandboxId=459ab4b83580513da251182d08dc217d0079613d10952df00ffcca6e2537958b,hairpinMode=false,linkLocalIPv6Address=,linkLocalIPv6PrefixLen=0,ports={3306/tcp=null, 33060/tcp=null},sandboxKey=/var/run/docker/netns/459ab4b83580,secondaryIPAddresses=<null>,secondaryIPv6Addresses=<null>,endpointID=,gateway=,portMapping=<null>,globalIPv6Address=,globalIPv6PrefixLen=0,ipAddress=,ipPrefixLen=0,ipV6Gateway=,macAddress=,networks={aimind-overlay=com.github.dockerjava.api.model.ContainerNetwork@53a9fcfd[ipamConfig=com.github.dockerjava.api.model.ContainerNetwork$Ipam@21f459fc,links=<null>,aliases=[2cf128f77797],networkID=emypqxzjggws7uicersyz6uag,endpointId=56a78b2527a6dcf83fd3dc2794c514aaa325457d9c8a21bd236d3ea3c22c8fa9,gateway=,ipAddress=10.0.0.4,ipPrefixLen=24,ipV6Gateway=,globalIPv6Address=,globalIPv6PrefixLen=0,macAddress=02:42:0a:00:00:04]}],path=docker-entrypoint.sh,processLabel=,resolvConfPath=/data/docker/containers/2cf128f77797f08419f50a057973388f15753efb16134ed05370ded495d0ac08/resolv.conf,execIds=<null>,state=com.github.dockerjava.api.command.InspectContainerResponse$ContainerState@4d192aef[status=running,running=true,paused=false,restarting=false,oomKilled=false,dead=false,pid=14884,exitCode=0,error=,startedAt=2019-08-30T08:09:43.402630785Z,finishedAt=0001-01-01T00:00:00Z,health=<null>],volumes=<null>,volumesRW=<null>,node=<null>,mounts=[com.github.dockerjava.api.command.InspectContainerResponse$Mount@1416cf9f[name=c2128d05001b8fec1712807f381e2c72d42ce8a83ae97f6b038f51c0d48446f1,source=/data/docker/volumes/c2128d05001b8fec1712807f381e2c72d42ce8a83ae97f6b038f51c0d48446f1/_data,destination=/var/lib/mysql,driver=local,mode=,rw=true]],graphDriver=com.github.dockerjava.api.command.GraphDriver@84487f4[name=overlay2,data=com.github.dockerjava.api.command.GraphData@bfc14b9[rootDir=<null>,deviceId=<null>,deviceName=<null>,deviceSize=<null>,dir=<null>]],platform=linux]
Disconnected from the target VM, address: '127.0.0.1:60730', transport: 'socket'
com.github.dockerjava.api.model.Statistics@55a88417[read=2019-09-02T12:20:14.534216408Z,networks={eth0=com.github.dockerjava.api.model.StatisticNetworksConfig@18acfe88[rxBytes=0,rxDropped=0,rxErrors=0,rxPackets=0,txBytes=0,txDropped=0,txErrors=0,txPackets=0], eth1=com.github.dockerjava.api.model.StatisticNetworksConfig@8a2a6a[rxBytes=197752,rxDropped=0,rxErrors=0,rxPackets=836,txBytes=0,txDropped=0,txErrors=0,txPackets=0]},network=<null>,memoryStats=com.github.dockerjava.api.model.MemoryStatsConfig@772861aa,blkioStats=BlkioStatsConfig[ioServiceBytesRecursive=[BlkioStatEntry[major=8,minor=0,op=Read,value=8192], BlkioStatEntry[major=8,minor=0,op=Write,value=1259921408], BlkioStatEntry[major=8,minor=0,op=Sync,value=1258987520], BlkioStatEntry[major=8,minor=0,op=Async,value=942080], BlkioStatEntry[major=8,minor=0,op=Total,value=1259929600]],ioServicedRecursive=[BlkioStatEntry[major=8,minor=0,op=Read,value=2], BlkioStatEntry[major=8,minor=0,op=Write,value=4066], BlkioStatEntry[major=8,minor=0,op=Sync,value=4009], BlkioStatEntry[major=8,minor=0,op=Async,value=59], BlkioStatEntry[major=8,minor=0,op=Total,value=4068]],ioQueueRecursive=[],ioServiceTimeRecursive=[],ioWaitTimeRecursive=[],ioMergedRecursive=[],ioTimeRecursive=[],sectorsRecursive=[]],cpuStats=com.github.dockerjava.api.model.CpuStatsConfig@4cb40e3b,preCpuStats=com.github.dockerjava.api.model.CpuStatsConfig@41b1f51e,pidsStats=com.github.dockerjava.api.model.PidsStatsConfig@3a543f31]

XNginx升级记录

发表于 2019-08-26 更新于 2021-07-12 分类于博客， jqpeng
本文字数： 4.7k 阅读时长 ≈ 4 分钟

文章作者:jqpeng
原文链接: XNginx升级记录

之前的博文提到过，XNginx - nginx 集群可视化管理工具, 开发完成后一直稳定运行，直到前面因为一个站点的proxy站点配置问题，导致需要修改nginx 配置文件模板，因此借此机会对系统做了升级。

前端升级到最新版的ng-alain

事实证明，升级是痛苦的，前端项目真是一言难尽，能不动最好不动！

主要的变更是：

之前的simple-table变成了st
desc也没了，成了sv，
page-header等的action也需要显示的指定

查查文档，前后花了一个多小时，前端的升级真是太快了。。。

vhost增加default

通常会有类似下面的配置存在，通过default来标示是默认的配置：

  server {
        listen 80 default;
        client_max_body_size 10240M;
      
        location / {
        
        proxy_pass http://proxy234648622.k8s_server;
            proxy_set_header HOST $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                            
      }          
   }

因此，这次给vhost增加了default选项，这样生成配置文件就可以加上default。

生成的配置文件：

SSL配置增加导入证书

之前SSL配置需要手动打开证书文件，拷贝文件内容到文本框，这次前端升级，增加了导入按钮，用户选择后直接读取证书文件.

实现很简单，使用nz-upload上传文件，通过nzBeforeUpload进行拦截，读取文件。

 <div nz-col [nzSpan]="2" *ngIf="dto.enableSSL">
        <nz-upload nzShowUploadList="false" [nzBeforeUpload]="readCertificate"><button nz-icon="upload" nz-button
            nzType="nz-button.default" nzSize="small">导入</button> </nz-upload>
      </div>

读取可以使用FileReader，记得return false。

  readCertificate = (file: File) => {
    const reader = new FileReader();
    reader.readAsText(file);
    this.dto.sslCertificate.commonName = file.name;
    reader.onload = () => {
      this.dto.sslCertificate.content = reader.result.toString();
    }
    return false;
  }

导入已有配置文件

本次升级，在vhosts管理地方，增加了一个导入按钮，可以导入配置信息。

支持的方式是要求将配置文件及其相关资源，打包为zip，上传到系统后台进行解析, 接口代码：

@PostMapping("/importConfig/{groupId}")
    @Timed
    public String uploadConfFile(@RequestParam("file") MultipartFile file, @PathVariable String groupId) {
        if (file.isEmpty()) {
            return "Please select a file to upload";
        }

        if (!file.getContentType().equalsIgnoreCase("application/x-zip-compressed")) {
            return "only support.zip";
        }

        File upFile = new File(new File(TEMP_FILE_PATH),  System.currentTimeMillis() + file.getOriginalFilename());
        try {
            if(upFile.exists()){
                upFile.delete();
            }
            file.transferTo(upFile);
        } catch (IllegalStateException | IOException ex) {
            return "upload error！";
        }

        try {
            nginxConfigService.parseFromZipFile(upFile, groupId);
        } catch (IOException e) {
            return "upload error！";
        }
        return "success";
    }

解析代码比较简单，先解压zip，然后找到nginx.conf，再调用上文提到的解析代码解析指令。

 public void parseConfig(String confidDir, String groupId) {

        // 查找nginx.conf
        String nginxFile = searchForFile(new File(confidDir), "nginx.conf");
        if (nginxFile.length() == 0) {
            throw new RuntimeException("can't find nginx.conf,please make sure nginx.conf exist !");
        }

        List<Directive> directives = NginxConfParser.newBuilder().withConfigurationFile(nginxFile).parse();
        directives.stream().forEach(directive -> {
            if (directive instanceof ProxyDirective) {
                saveUpStream((ProxyDirective) directive);
            } else if (directive instanceof VirtualHostDirective) {
                saveVHost((VirtualHostDirective) directive, groupId);
            }
        });

    }

    public void parseFromZipFile(File file, String groupId) throws IOException {
        String tempDir = Paths.get(file.getPath()).getParent().toString() + File.separator + file.getName() + ".extract";
        UnZipFile.unZipFiles(file, tempDir);
        parseConfig(tempDir, groupId);
    }

前后端项目合并到一起

之前前后端独立部署，如果项目足够大尚可，但是这个xnginx相对比较简单，独立部署费时费力，因此本次将前后端合并到一起

合并方法：

在backend新建一个webapp目录，将web代码放入
将web的相关配置文件拷贝到上层目录

然后修改angular.json、tsconfig.json 等包含路径的地址进行修改

 "xnginx": {
      "projectType": "application",
      "root": "",
      "sourceRoot": "webapp/src",
      "prefix": "app",
      "schematics": {
        "@schematics/angular:component": {
          "styleext": "less"
        }
      },

最后，修改angular.json的build配置，将构建结果保存到’target/classes/static’,这样java项目打包时就能将前端资源带入：

  "build": {
          "builder": "@angular-devkit/build-angular:browser",
          "options": {
            "outputPath": "target/classes/static",
            "index": "webapp/src/index.html",
            "main": "webapp/src/main.ts",
            "tsConfig": "tsconfig.app.json",
            "polyfills": "webapp/src/polyfills.ts",
            "assets": [
              "webapp/src/assets",
              "webapp/src/favicon.ico"
            ],
            "styles": [
              "webapp/src/styles.less"
            ],
            "scripts": [
              "node_modules/@antv/g2/build/g2.js",
              "node_modules/@antv/data-set/dist/data-set.min.js",
              "node_modules/@antv/g2-plugin-slider/dist/g2-plugin-slider.min.js",
              "node_modules/ajv/dist/ajv.bundle.js",
              "node_modules/qrious/dist/qrious.min.js"
            ]
          },

注意事项：

先构建前端，npm run build
再构建后端 mvn package -DskipTests

sparql 查询语句快速入门

发表于 2019-08-23 更新于 2021-07-12 分类于博客， jqpeng
本文字数： 4.5k 阅读时长 ≈ 4 分钟

文章作者:jqpeng
原文链接: sparql 查询语句快速入门

介绍

SPARQL即SPARQL Protocol and RDF Query Language的递归缩写，被专门设计用来访问和操作RDF数据，是语义网的核心技术之一。W3C的RDF数据存取小组（RDF Data Access Working Group, RDAWG）对其进行了标准化。2008年1月15日，SPARQL正式成为一项W3C推荐标准。

我们可以将抽取的RDF三元组导入Apache Jena Fuseki，通过SPARQL进行查询：

简单查询

SQL	sparql
SELECT title from book where id=’book1’	SELECT ?title WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title . }

Query Result:

title
“SPARQL Tutorial”

多字段匹配

RDF 数据

@prefix foaf:  <http://xmlns.com/foaf/0.1/> .

_:a  foaf:name   "Johnny Lee Outlaw" .
_:a  foaf:mbox   <mailto:jlow@example.com> .
_:b  foaf:name   "Peter Goodguy" .
_:b  foaf:mbox   <mailto:peter@example.org> .
_:c  foaf:mbox   <mailto:carol@example.org> .

sparql:

PREFIX foaf:   <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE
  { ?x foaf:name ?name .
    ?x foaf:mbox ?mbox }

SQL:

SELECT ?name ?mbox
from foaf

查询结果:

name	mbox
“Johnny Lee Outlaw”	mailto:jlow@example.com
“Peter Goodguy”	mailto:peter@example.org

数据属性匹配

对于string类型，需要用双引号包裹起来。

sparql:

SELECT ?v WHERE { ?v ?p "cat" }

SQL:

SELECT *
from ns
where p='cat'

对于数字类型：

sparql:

SELECT ?v WHERE { ?v ?p 42 }

SQL:

SELECT * from ns where p= 42

另外，在spaql里可以指定匹配的类型：

SELECT ?v WHERE { ?v ?p "abc"^^<http://example.org/datatype#specialDatatype> }

条件过滤

模糊匹配

通过regex函数可以进行字符串正则匹配，通过FILTER进行过滤

PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
SELECT  ?title
WHERE   { ?x dc:title ?title
          FILTER regex(?title, "web", "i" ) 
        }

SQL:

SELECT * from table where title like '%web%'

数字比较

PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
PREFIX  ns:  <http://example.org/ns#>
SELECT  ?title ?price
WHERE   { ?x ns:price ?price .
          FILTER (?price < 30.5)
          ?x dc:title ?title . }

SQL:

SELECT title,price from table where price <30.5

OPTIONAL（可选值）

RDF 数据，用户Bob没有mbox，而用户Alice有两个mbox

@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
@prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

_:a  rdf:type        foaf:Person .
_:a  foaf:name       "Alice" .
_:a  foaf:mbox       <mailto:alice@example.com> .
_:a  foaf:mbox       <mailto:alice@work.example> .

_:b  rdf:type        foaf:Person .
_:b  foaf:name       "Bob" .

正常查询，因为Bob没有mbox，所以查询不出来，可以通过OPTIONAL标记mbox为可选，这样Bob就可以查询出来。

sparql:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE  { ?x foaf:name  ?name .
         OPTIONAL { ?x  foaf:mbox  ?mbox }
       }

查询结果

name	mbox
“Alice”	mailto:alice@example.com
“Alice”	mailto:alice@work.example
“Bob”

可以看到， "Bob"的 mbox是空值。

对于关系型数据库，可以假设两个表

User { id,name}
Mbox {id,uid,name} (uid为外键）

对应的sql：

SELECT user.name AS name,mbox.name AS mboxName
FROM User user
LEFT OUTER JOIN Mbox mbox ON mbox.uid=user.id

OPTIONAL + FILTER

OPTIONAL 可以和FILTER 组合使用

PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
PREFIX  ns:  <http://example.org/ns#>
SELECT  ?title ?price
WHERE   { ?x dc:title ?title .
          OPTIONAL { ?x ns:price ?price . FILTER (?price < 30) }
        }

UNION

Data:

@prefix dc10:  <http://purl.org/dc/elements/1.0/> .
@prefix dc11:  <http://purl.org/dc/elements/1.1/> .

_:a  dc10:title     "SPARQL Query Language Tutorial" .
_:a  dc10:creator   "Alice" .

_:b  dc11:title     "SPARQL Protocol Tutorial" .
_:b  dc11:creator   "Bob" .

_:c  dc10:title     "SPARQL" .
_:c  dc11:title     "SPARQL (updated)"

查询:

PREFIX dc10:  <http://purl.org/dc/elements/1.0/>
PREFIX dc11:  <http://purl.org/dc/elements/1.1/>

SELECT ?title
WHERE  { { ?book dc10:title  ?title } UNION { ?book dc11:title  ?title } }

Query result:

title
"SPARQL Protocol Tutorial"
"SPARQL"
"SPARQL (updated)"
"SPARQL Query Language Tutorial"

排序

和sql一样，使用ORDER BY 排序，示例如下：

PREFIX foaf:    <http://xmlns.com/foaf/0.1/>

SELECT ?name
WHERE { ?x foaf:name ?name ; :empId ?emp }
ORDER BY ?name DESC(?emp)

去重

和sql一样，使用DISTINCT来去重，示例如下：

PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?name WHERE { ?x foaf:name ?name }

判断是否存在

使用ask来判断是否有解决方案

PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
ASK  { ?x foaf:name  "Alice" ;
          foaf:mbox  <mailto:alice@work.example> }

Docker启用TLS进行安全配置

发表于 2019-08-08 更新于 2021-07-12 分类于博客， jqpeng
本文字数： 4.2k 阅读时长 ≈ 4 分钟

文章作者:jqpeng
原文链接: Docker启用TLS进行安全配置

之前开启了docker的2375 Remote API，接到公司安全部门的要求，需要启用授权，翻了下官方文档

Protect the Docker daemon socket

启用TLS

在docker服务器，生成CA私有和公共密钥

$ openssl genrsa -aes256 -out ca-key.pem 4096
Generating RSA private key, 4096 bit long modulus
............................................................................................................................................................................................++
........++
e is 65537 (0x10001)
Enter pass phrase for ca-key.pem:
Verifying - Enter pass phrase for ca-key.pem:

$ openssl req -new -x509 -days 365 -key ca-key.pem -sha256 -out ca.pem
Enter pass phrase for ca-key.pem:
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:
State or Province Name (full name) [Some-State]:Queensland
Locality Name (eg, city) []:Brisbane
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Docker Inc
Organizational Unit Name (eg, section) []:Sales
Common Name (e.g. server FQDN or YOUR name) []:$HOST
Email Address []:Sven@home.org.au

有了CA后，可以创建一个服务器密钥和证书签名请求(CSR)

$ openssl genrsa -out server-key.pem 4096
Generating RSA private key, 4096 bit long modulus
.....................................................................++
.................................................................................................++
e is 65537 (0x10001)

$ openssl req -subj "/CN=$HOST" -sha256 -new -key server-key.pem -out server.csr

接着，用CA来签署公共密钥:

$ echo subjectAltName = DNS:$HOST,IP:$HOST:127.0.0.1 >> extfile.cnf

 $ echo extendedKeyUsage = serverAuth >> extfile.cnf

生成key：

$ openssl x509 -req -days 365 -sha256 -in server.csr -CA ca.pem -CAkey ca-key.pem \
  -CAcreateserial -out server-cert.pem -extfile extfile.cnf
Signature ok
subject=/CN=your.host.com
Getting CA Private Key
Enter pass phrase for ca-key.pem:

创建客户端密钥和证书签名请求:

$ openssl genrsa -out key.pem 4096
Generating RSA private key, 4096 bit long modulus
.........................................................++
................++
e is 65537 (0x10001)

$ openssl req -subj '/CN=client' -new -key key.pem -out client.csr

修改extfile.cnf：

echo extendedKeyUsage = clientAuth > extfile-client.cnf

生成签名私钥：

$ openssl x509 -req -days 365 -sha256 -in client.csr -CA ca.pem -CAkey ca-key.pem \
  -CAcreateserial -out cert.pem -extfile extfile-client.cnf
Signature ok
subject=/CN=client
Getting CA Private Key
Enter pass phrase for ca-key.pem:

将Docker服务停止，然后修改docker服务文件

[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.io

[Service]
Environment="PATH=/opt/kube/bin:/bin:/sbin:/usr/bin:/usr/sbin"
ExecStart=/opt/kube/bin/dockerd  --tlsverify --tlscacert=/root/docker/ca.pem --tlscert=/root/docker/server-cert.pem --tlskey=/root/docker/server-key.pem -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375
ExecStartPost=/sbin/iptables -I FORWARD -s 0.0.0.0/0 -j ACCEPT
ExecReload=/bin/kill -s HUP $MAINPID
Restart=on-failure
RestartSec=5
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target

然后重启服务

systemctl daemon-reload
systemctl restart docker.service 



重启后查看服务状态：

systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/etc/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-08-08 19:22:26 CST; 1 min ago

已经生效。

使用证书连接：

复制ca.pem,cert.pem,key.pem三个文件到客户端

docker --tlsverify --tlscacert=ca.pem --tlscert=cert.pem --tlskey=key.pem -H=$HOST:2375 version连接即可

docker-java 启用TLS

项目里使用docker的java客户端docker-java调用docker，为了支持TLS，在创建客户端时，需要增加TLS设置。

首先将ca.pem cert.pem key.pem 这三个文件拷贝到本地，例如E:\\docker\\",

然后DefaultDockerClientConfig里withDockerTlsVerify设为true，并设置certpath为刚拷贝的目录。

DefaultDockerClientConfig.Builder builder =
                DefaultDockerClientConfig.createDefaultConfigBuilder()
                    .withDockerHost("tcp://" + server + ":2375")
                    .withApiVersion("1.30");
            if (containerConfiguration.getDockerTlsVerify()) {
                builder = builder.withDockerTlsVerify(true)
                    .withDockerCertPath("E:\\docker\\");
            }return  DockerClientBuilder.getInstance(builder.build()).build()

大工搞定。

mongodb海量数据CRUD优化

发表于 2019-05-28 更新于 2021-07-12 分类于博客， jqpeng
本文字数： 3.5k 阅读时长 ≈ 3 分钟

文章作者:jqpeng
原文链接: mongodb海量数据CRUD优化

1. 批量保存优化

避免一条一条查询，采用bulkWrite, 基于ReplaceOneModel，启用upsert:

 public void batchSave(List<?> spoTriples, KgInstance kgInstance) {
        MongoConverter converter = mongoTemplate.getConverter();
        List<ReplaceOneModel<Document>> bulkOperationList = spoTriples.stream()
                .map(thing -> {
                    org.bson.Document dbDoc = new org.bson.Document();
                    converter.write(thing, dbDoc);
                    ReplaceOneModel<org.bson.Document> replaceOneModel = new ReplaceOneModel(
                            Filters.eq(UNDERSCORE_ID, dbDoc.get(UNDERSCORE_ID)), 
                            dbDoc,
                            new UpdateOptions().upsert(true));
                    return replaceOneModel;
                })
                .collect(Collectors.toList());
        mongoTemplate.getCollection(getCollection(kgInstance)).bulkWrite(bulkOperationList);
    }

2. 分页优化

经常用于查询的字段，需要确保建立了索引。

对于包含多个键的查询，可以创建符合索引。

2.1 避免不必要的count

查询时，走索引，速度并不慢，但是如果返回分页Page<?>，需要查询totalcount，当单表数据过大时，count会比较耗时，但是设想意向，你真的需要准确的数字吗？

在google、百度等搜索引擎搜索关键词时，只会给你有限的几个结果，因此，我们也不必给出准确的数字，设定一个阈值，比如1万，当我们发现总量大于1万时，返回1万，前端显示大于1万条即可。

原理也很鉴定啊，我们skip掉MAX_PAGE_COUNT，看是否还有数据，如果有就说明总量大于MAX_PAGE_COUNT，返回MAX_PAGE_COUNT即可，否则，计算真正的count。

int MAX_PAGE_COUNT = 10000;


/**
     * 当总数大于阈值时，不再计算总数
     *
     * @param mongoTemplate
     * @param query
     * @param collectionName
     * @return
     */
    private long count(MongoTemplate mongoTemplate, Query query, String collectionName) {
        query = query.with(PageRequest.of(MAX_PAGE_COUNT, 1));
        if (mongoTemplate.find(query, Thing.class, collectionName).size() > 0) {
            return MAX_PAGE_COUNT;
        }
        return mongoTemplate.count(query, collectionName);
    }

前端显示：

2.2 避免过多的skip

分页不过避免需要先跳过一些数据，这个过程是需要消耗时间的，可以通过一个小技巧避免跳过。

比如，显示列表时，排序为按最后修改时间倒序，每页显示100条，现在要显示第100页。
按照正常的做法，需要跳过99*100条数据，非常大的代价。换一个角度思考，因为数据是有序的，因此第100页的数据的最后修改时间是小于第99页最小的修改时间，查询时加上这个条件，就可以直接取符合条件的前100条即可。

3. 全量导出优化

3.1 去掉不需要的字段

查询时，指定真正有用的字段，这样可以有效减少数据传输量，加快查询效率。
例如：

         Query query = new Query();
        query.fields().include("_id").include("name").include("hot").include("alias");

3.2 避免使用findAll或者分页查询，改用stream

全量导出有两个误区，一是直接findAll,当数据量过大时，很容易导致服务器OutofMermory，就算没有OOM，也会对服务器造成极大的负载，影响兄弟服务。另外，FindAll一次性加载数据到内存，整个速度也会比较慢，需要等待所有数据进入内存后才能开始处理。

另外一个误区是，分页查询，依次处理。分页查询可以有效减少服务器负担，不失为一种可行的方法。但是就和上面分页说的那样，分页到后面的时候，需要skip掉前面的数据，存在无用功。稍微好一点的做法就是按照之前说的，将skip转换为condtion，这种方式效率OK，但不推荐，存在代码冗余。

            Page<Thing> dataList = entityDao.findAllByPage(kgDataStoreService.getKgCollectionByKgInstance(kg), page);
            Map<String, Individual> thingId2Resource = new ConcurrentHashMap<>();

            appendThingsToModel(model, concept2OntClass, hot, alias, dataList, thingId2Resource);

            while (dataList.hasNext()) {
                page = PageRequest.of(page.getPageNumber() + 1, page.getPageSize());
                dataList = entityDao.findAllByPage(kgDataStoreService.getKgCollectionByKgInstance(kg), page);
                appendThingsToModel(model, concept2OntClass, hot, alias, dataList, thingId2Resource);
            }

更推荐的做法是，采用mongoTemplate的steam方法,返回CloseableIterator迭代器，读一条数据处理一条数据，实现高效处理：

@Overridepublic <T> CloseableIterator<T> stream(final Query query, final Class<T> entityType, final String collectionName) {    return doStream(query, entityType, collectionName, entityType);}

改用方法后，代码可以更简化高效：

  CloseableIterator<Thing> dataList = kgDataStoreService.getSimpleInfoIterator(kg);

            // 实体导入
            // Page<Thing> dataList = entityDao.findAllByPage(kgDataStoreService.getKgCollectionByKgInstance(kg), page);
            Map<String, Individual> thingId2Resource = new ConcurrentHashMap<>();

            appendThingsToModel(model, concept2OntClass, hot, alias, dataList, thingId2Resource);

待续。。。