• 搭建Atlas2.2.0 集成CDH6.3.2 生产环境+kerberos


    首先确保环境的干净,如果之前有安装过清理掉相关残留

    确保安装atlas的服务器有足够的内存(至少16G),有必要的hadoop角色

    • HDFS客户端 — 检索和更新Hadoop使用的用户组信息(UGI)中帐户成员资格的信息。对调试很有用。
    • HBase Client - Atlas 存储其 Janus 数据库,用于初始导入 HBase 内容,因此它需要持续访问 HBase 服务中的两个表。
    • Hive 客户端 - 用于初始导入 Hive 内容。

    备注: kafka我取消了kerberos。没搞定atlas用带kerberos的kafka传消息

    准备编译环境

    mvn3.8.8 必须3.8以上的版本 3.6无法编译

    java 1.8.0_181 跟你的CDH环境保持一致

    node node-v16.20.2 

    下载和解压缩源代码

    该项目的网站可以在这里找到 Apache Atlas – Data Governance and Metadata framework for Hadoop

    查找并下载 Apache Atlas 

    更改pom.xml

    在主pom(就是文件夹打开第一个)添加一个包含 maven 工件的 clouder 存储库

    1. <repository>
    2. <id>cloudera</id>
    3. <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
    4. <releases>
    5. <enabled>true</enabled>
    6. </releases>
    7. <snapshots>
    8. <enabled>false</enabled>
    9. </snapshots>
    10. </repository>

    然后修改对应的cdh组件版本 

    1. <hadoop.version>3.0.0-cdh6.3.2</hadoop.version>
    2. <hbase.version>2.1.0-cdh6.3.2</hbase.version>
    3. <hive.version>2.1.1-cdh6.3.2</hive.version>
    4. <kafka.scala.binary.version>2.11</kafka.scala.binary.version>
    5. <kafka.version>2.2.1-cdh6.3.2</kafka.version>
    6. <solr-test-framework.version>7.4.0-cdh6.3.2</solr-test-framework.version>
    7. <lucene-solr.version>7.4.0</lucene-solr.version>
    8. <solr.version>7.4.0-cdh6.3.2</solr.version>
    9. <sqoop.version>1.4.7-cdh6.3.2</sqoop.version>
    10. <zookeeper.version>3.4.5-cdh6.3.2</zookeeper.version>

    然后修改一些jar包的版本

    1. 将“atlas-buildtools”工件的版本从“1.0”更改为“0.8.1
    2. <dependency>
    3. <groupId>org.apache.atlas</groupId>
    4. <artifactId>atlas-buildtools</artifactId>
    5. <version>0.8.1</version>
    6. </dependency>
    7. 修改jsr.version为2.0.1
    8. <jsr.version>2.0.1</jsr.version>

    修改一些次pom 

    1. 主目录下
    2. grep -rn jsr311-apii | grep pom.xml
    3. addons/impala-bridge/pom.xml:332
    4. addons/falcon-bridge/pom.xml:178
    5. addons/hive-bridge/pom.xml:312:
    6. addons/hbase-bridge/pom.xml:345:
    7. addons/storm-bridge/pom.xml:360:
    8. addons/sqoop-bridge/pom.xml:250:
    9. 这几个pom中jsr311-api改成javax.ws.rs-api

    修改其他文件

     在文件

    addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java

    中,转到第618行,

    注释”String catalogName = hiveDB.getCatalogName() != null ?hiveDB.getCatalogName().toLowerCase() : null;“

    并添加 ”String catalogName = null;“:

    1. public static String getDatabaseName(Database hiveDB) {
    2. String dbName = hiveDB.getName().toLowerCase();
    3. //String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;
    4. String catalogName = null;
    5. if (StringUtils.isNotEmpty(catalogName) && !StringUtils.equals(catalogName, DEFAULT_METASTORE_CATALOG)) {
    6. dbName = catalogName + SEP + dbName;
    7. }
    8. return dbName;
    9. }

    在文件

    addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java

    转到第83行”this.metastoreHandler = (listenerEvent != null) ?metastoreEvent.getIHMSHandler() : null;“,

    注释它并添加”this.metastoreHandler = null;“: 

    1. public AtlasHiveHookContext(HiveHook hook, HiveOperation hiveOperation, HookContext hiveContext, HiveHookObjectNamesCache knownObjects,
    2. HiveMetastoreHook metastoreHook, ListenerEvent listenerEvent) throws Exception {
    3. this.hook = hook;
    4. this.hiveOperation = hiveOperation;
    5. this.hiveContext = hiveContext;
    6. this.hive = hiveContext != null ? Hive.get(hiveContext.getConf()) : null;
    7. this.knownObjects = knownObjects;
    8. this.metastoreHook = metastoreHook;
    9. this.metastoreEvent = listenerEvent;
    10. //this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;
    11. this.metastoreHandler = null;
    12. init();
    13. }

     在文件addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java

    注释第 293 行提到“MATERIALIZED_VIEW”:

    1. private boolean isDdlOperation(AtlasEntity entity) {
    2. return entity != null && !context.isMetastoreHook()
    3. && (context.getHiveOperation().equals(HiveOperation.CREATETABLE_AS_SELECT)
    4. || context.getHiveOperation().equals(HiveOperation.CREATEVIEW)
    5. || context.getHiveOperation().equals(HiveOperation.ALTERVIEW_AS));
    6. //|| context.getHiveOperation().equals(HiveOperation.CREATE_MATERIALIZED_VIEW));
    7. }

    注意这里要加;号,因为原来的符号被注释了 

     在文件addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java

    注释提及“MATERIALIZED_VIEW”的第 212 行和第 217 行

    开始构建。基本无坑。有问题多试几次。有时候会因为网络问题下不到包

    mvn clean  -DskipTests package -Pdist  -Drat.skip=true

    包在distro/target/apache-atlas-2.2.0-bin.tar.gz

    不要用官方文档说的server包 那个包没有各种hook文件

    解压到安装目录,开始安装

    为atlas部署准备 CDH 集群服务

    • Atlas使用HBase来存储他的Janus数据库。
    • Solr 用于存储和搜索审核日志。
    • Kafka被用作从Atlas库(即嵌入Hadoop服务中的钩子)到Atlas本身的消息发送器。

    1.1. 在 HBase 中创建必要的表

    1. 在 Atlas 计算机或安装了“HBase 网关”角色的任何其他计算机上,创建必要的表
    1. TABLE1="apache_atlas_entity_audit"
    2. TABLE2="apache_atlas_janus"
    3. echo "create '${TABLE1}', 'dt'; grant 'atlas', 'RWXCA', '${TABLE1}'" | hbase shell
    4. echo "create '${TABLE2}', 's'; grant 'atlas', 'RWXCA', '${TABLE2}'" | hbase shell

    如果将“atlas”帐户添加到“hbase.superuser”参数中,或添加到“test2_hbase_su”IPA组(在Hadoop直接连接到LDAP的情况下),或将所有HBase权限分配给“atlas”帐户,Atlas将自动创建必要的数据库并将其权限分配给atlas帐户。这有时很方便,例如,出于研究目的。
    请注意,在 Atlas 2.2.0 中,在将 'atlas' 添加到 'hbase.superuser' 的情况下,这不起作用。 

    检查已创建的表

    1. 在 Atlas 计算机或安装了“HBase 网关”角色的任何其他计算机上,执行:

      echo "list" | hbase shell
      复制

      标准输出:

      1. Took 0.0028 seconds
      2. list
      3. TABLE
      4. apache_atlas_entity_audit
      5. apache_atlas_janus
      6. 2 row(s)
      7. Took 0.6872 seconds
      8. ["apache_atlas_entity_audit", "apache_atlas_janus"]

    添加hbase集群配置文件到conf/hbase

    ln -s /etc/hbase/conf/ /data/apache-atlas-2.2.0/conf/hbase

    Apache Kafka

    Atlas 使用 Apache Kafka 接收有关 Hadoop 服务中发生的事件的消息。消息是使用嵌入在某些服务中的Atlas的特殊库发送的。目前,Atlas 读取有关 Hbase 和 Hive 中事件的消息,例如创建和删除表、添加列、等等等等......

    在 Kafka 中添加必要的topic

    1. Apache Atlas 需要 Apache Kafka 中的三个topic。在安装了 Kafka 的计算机上创建它们:

    1. kafka-topics --zookeeper S0:2181,S1:2181,S2:2181,S3:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
    2. kafka-topics --zookeeper S0:2181,S1:2181,S2:2181,S3:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
    3. kafka-topics --zookeeper S0:2181,S1:2181,S2:2181,S3:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK

    有kerberos的关掉kerberos,我没测试成功使用带kerberos的kafka

     配置atlas的sentry role 以访问kafka topic 

    在具有“Kafka 网关”和“sentry网关”角色的机器上,在sentry中创建“kafka4atlas_role”角色:

    1. KROLE="kafka4atlas_role"
    2. kafka-sentry -cr -r ${KROLE}
    3. 将创建的角色分配给 atlas 组:
    4. kafka-sentry -arg -r ${KROLE} -g atlas
    5. 为消费者分配权限:
    6. TOPIC1="_HOATLASOK"
    7. TOPIC2="ATLAS_ENTITIES"
    8. TOPIC3="ATLAS_HOOK"
    9. kafka-sentry -gpr -r ${KROLE} -p "Host=*->CONSUMERGROUP=*->action=read"
    10. kafka-sentry -gpr -r ${KROLE} -p "Host=*->CONSUMERGROUP=*->action=describe"
    11. kafka-sentry -gpr -r ${KROLE} -p "HOST=*->TOPIC=${TOPIC1}->action=read"
    12. kafka-sentry -gpr -r ${KROLE} -p "HOST=*->TOPIC=${TOPIC2}->action=read"
    13. kafka-sentry -gpr -r ${KROLE} -p "HOST=*->TOPIC=${TOPIC3}->action=read"
    14. kafka-sentry -gpr -r ${KROLE} -p "HOST=*->TOPIC=${TOPIC1}->action=describe"
    15. kafka-sentry -gpr -r ${KROLE} -p "HOST=*->TOPIC=${TOPIC2}->action=describe"
    16. kafka-sentry -gpr -r ${KROLE} -p "HOST=*->TOPIC=${TOPIC3}->action=describe"
    17. 为生产者分配权限
    18. kafka-sentry -gpr -r ${KROLE} -p "HOST=*->TOPIC=${TOPIC1}->action=write"
    19. kafka-sentry -gpr -r ${KROLE} -p "HOST=*->TOPIC=${TOPIC2}->action=write"
    20. kafka-sentry -gpr -r ${KROLE} -p "HOST=*->TOPIC=${TOPIC3}->action=write"
    21. 检查sentry设置
    22. $ kafka-sentry -lr
    23. ....
    24. solradm_role
    25. kafka4atlas_role
    26. 显示组及其分配角色的列表:
    27. $ kafka-sentry -lg
    28. ...
    29. atlas = kafka4atlas_role
    30. test2_solr_admins = solradm_role
    31. 显示权限列表:
    32. $ kafka-sentry -lp -r kafka4atlas_role
    33. ...
    34. HOST=*->TOPIC=_HOATLASOK->action=read
    35. HOST=*->TOPIC=_HOATLASOK->action=describe
    36. HOST=*->TOPIC=ATLAS_HOOK->action=read
    37. HOST=*->TOPIC=ATLAS_ENTITIES->action=describe
    38. HOST=*->TOPIC=ATLAS_HOOK->action=describe
    39. HOST=*->CONSUMERGROUP=*->action=describe
    40. HOST=*->TOPIC=_HOATLASOK->action=write
    41. HOST=*->TOPIC=ATLAS_ENTITIES->action=write
    42. HOST=*->TOPIC=ATLAS_HOOK->action=write
    43. HOST=*->TOPIC=ATLAS_ENTITIES->action=read
    44. HOST=*->CONSUMERGROUP=*->action=read

    集成CDH的Solr
    ①将apache-atlas-2.3.0/conf/solr文件拷贝到solr的安装目录下,即/opt/cloudera/parcels/CDh/lib/solr下,然后更名为atlas-solr

    ②创建collection

    1. vi /etc/passwd
    2. /sbin/nologin 修改为 /bin/bash
    3. su - solr
    4. /opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c vertex_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas-solr -shards 3 -replicationFactor 2
    5. /opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c edge_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas-solr -shards 3 -replicationFactor 2
    6. /opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c fulltext_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas-solr -shards 3 -replicationFactor 2


    ③验证创建collection成功
    登录 solr web控制台: http://xxxx:8983 验证是否启动成功

    创建好相关的kerberos帐号和keytab

    修改atlas-application.properties

    1. ######### Graph Database Configs #########
    2. # Graph Database
    3. #Configures the graph database to use. Defaults to JanusGraph
    4. #atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase
    5. # Graph Storage
    6. # Set atlas.graph.storage.backend to the correct value for your desired storage
    7. # backend. Possible values:
    8. #
    9. # hbase
    10. # cassandra
    11. # embeddedcassandra - Should only be set by building Atlas with -Pdist,embedded-cassandra-solr
    12. # berkeleyje
    13. #
    14. # See the configuration documentation for more information about configuring the various storage backends.
    15. #
    16. atlas.graph.storage.backend=hbase
    17. atlas.graph.storage.hbase.table=apache_atlas_janus
    18. #Hbase
    19. #For standalone mode , specify localhost
    20. #for distributed mode, specify zookeeper quorum here
    21. atlas.graph.storage.hostname=S0:2181,S1:2181,S2:2181
    22. atlas.graph.storage.hbase.regions-per-server=1
    23. atlas.graph.stoorage.lock.wait-time=10000
    24. #In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
    25. #the following properties
    26. #atlas.graph.storage.clustername=
    27. #atlas.graph.storage.port=
    28. # Gremlin Query Optimizer
    29. #
    30. # Enables rewriting gremlin queries to maximize performance. This flag is provided as
    31. # a possible way to work around any defects that are found in the optimizer until they
    32. # are resolved.
    33. #atlas.query.gremlinOptimizerEnabled=true
    34. # Delete handler
    35. #
    36. # This allows the default behavior of doing "soft" deletes to be changed.
    37. #
    38. # Allowed Values:
    39. # org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
    40. # org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
    41. #
    42. #atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1
    43. # Entity audit repository
    44. #
    45. # This allows the default behavior of logging entity changes to hbase to be changed.
    46. #
    47. # Allowed Values:
    48. # org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
    49. # org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
    50. # org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
    51. #
    52. atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository
    53. # if Cassandra is used as a backend for audit from the above property, uncomment and set the following
    54. # properties appropriately. If using the embedded cassandra profile, these properties can remain
    55. # commented out.
    56. # atlas.EntityAuditRepository.keyspace=atlas_audit
    57. # atlas.EntityAuditRepository.replicationFactor=1
    58. # Graph Search Index
    59. atlas.graph.index.search.backend=solr
    60. #Solr
    61. #Solr cloud mode properties
    62. atlas.graph.index.search.solr.mode=cloud
    63. atlas.graph.index.search.solr.zookeeper-url=S0:2181/solr,S1:2181/solr,S2:2181/solr
    64. atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
    65. atlas.graph.index.search.solr.zookeeper-session-timeout=60000
    66. atlas.graph.index.search.solr.wait-searcher=true
    67. #Solr http mode properties
    68. #atlas.graph.index.search.solr.mode=http
    69. #atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr
    70. # ElasticSearch support (Tech Preview)
    71. # Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
    72. # hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
    73. #
    74. # Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
    75. # https://www.elastic.co/products/x-pack/security
    76. #
    77. # Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
    78. # plugins: https://docs.janusgraph.org/latest/elasticsearch.html
    79. #atlas.graph.index.search.hostname=localhost
    80. #atlas.graph.index.search.elasticsearch.client-only=true
    81. # Solr-specific configuration property
    82. atlas.graph.index.search.max-result-set-size=150
    83. ######### Import Configs #########
    84. #atlas.import.temp.directory=/temp/import
    85. ######### Notification Configs #########
    86. atlas.notification.embedded=false
    87. atlas.kafka.data=${sys:atlas.home}/data/kafka
    88. atlas.kafka.zookeeper.connect=S0:2181,S1:2181,S2:2181
    89. atlas.kafka.bootstrap.servers=S0:9092,S1:9092,S2:9092
    90. atlas.kafka.zookeeper.session.timeout.ms=60000
    91. atlas.kafka.zookeeper.connection.timeout.ms=60000
    92. atlas.kafka.zookeeper.sync.time.ms=20
    93. atlas.kafka.auto.commit.interval.ms=1000
    94. atlas.kafka.hook.group.id=atlas
    95. atlas.kafka.enable.auto.commit=true
    96. atlas.kafka.auto.offset.reset=earliest
    97. atlas.kafka.session.timeout.ms=30000
    98. atlas.kafka.offsets.topic.replication.factor=3
    99. atlas.kafka.poll.timeout.ms=1000
    100. atlas.notification.create.topics=true
    101. atlas.notification.replicas=1
    102. atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
    103. atlas.notification.log.failed.messages=true
    104. atlas.notification.consumer.retry.interval=500
    105. atlas.notification.hook.retry.interval=1000
    106. # Enable for Kerberized Kafka clusters
    107. #atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
    108. #atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab
    109. ## Server port configuration
    110. atlas.server.http.port=21000
    111. #atlas.server.https.port=21443
    112. ######### Security Properties #########
    113. # SSL config
    114. atlas.enableTLS=false
    115. #truststore.file=/path/to/truststore.jks
    116. #cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks
    117. #following only required for 2-way SSL
    118. #keystore.file=/path/to/keystore.jks
    119. # Authentication config
    120. atlas.authentication.method=kerberos
    121. atlas.authentication.keytab=/data/hive.keytab
    122. atlas.authentication.principal=hive@TEST.COM
    123. atlas.authentication.method.kerberos=true
    124. atlas.authentication.method.kerberos.principal=hive@TEST.COM
    125. atlas.authentication.method.kerberos.keytab=/data/hive.keytab
    126. atlas.authentication.method.kerberos.name.rules=RULE:[2:$1@$0](hive@TEST.COM)s/.*/hive/
    127. atlas.authentication.method.kerberos.token.validity=3600
    128. #atlas.authentication.method.file=true
    129. #### ldap.type= LDAP or AD
    130. atlas.authentication.method.ldap.type=none
    131. #### user credentials file
    132. atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties
    133. ### groups from UGI
    134. #atlas.authentication.method.ldap.ugi-groups=true
    135. ######## LDAP properties #########
    136. #atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
    137. #atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
    138. #atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
    139. #atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
    140. #atlas.authentication.method.ldap.groupRoleAttribute=cn
    141. #atlas.authentication.method.ldap.base.dn=dc=example,dc=com
    142. #atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
    143. #atlas.authentication.method.ldap.bind.password=<password>
    144. #atlas.authentication.method.ldap.referral=ignore
    145. #atlas.authentication.method.ldap.user.searchfilter=(uid={0})
    146. #atlas.authentication.method.ldap.default.role=<default role>
    147. ######### Active directory properties #######
    148. #atlas.authentication.method.ldap.ad.domain=example.com
    149. #atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
    150. #atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
    151. #atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
    152. #atlas.authentication.method.ldap.ad.bind.password=<password>
    153. #atlas.authentication.method.ldap.ad.referral=ignore
    154. #atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
    155. #atlas.authentication.method.ldap.ad.default.role=<default role>
    156. ######### JAAS Configuration ########
    157. atlas.jaas.KafkaClient.loginModuleName=com.sun.security.auth.module.Krb5LoginModule
    158. atlas.jaas.KafkaClient.loginModuleControlFlag=required
    159. atlas.jaas.KafkaClient.option.useKeyTab=true
    160. atlas.jaas.KafkaClient.option.storeKey=true
    161. atlas.jaas.KafkaClient.option.serviceName=kafka
    162. atlas.jaas.KafkaClient.option.keyTab=/data/atlas.service.keytab
    163. atlas.jaas.KafkaClient.option.principal=atlas/s1.hadoop.com@TEST.COM
    164. atlas.jaas.Client.loginModuleName=com.sun.security.auth.module.Krb5LoginModule
    165. atlas.jaas.Client.loginModuleControlFlag=required
    166. atlas.jaas.Client.option.useKeyTab=true
    167. atlas.jaas.Client.option.storeKey=true
    168. atlas.jaas.Client.option.keyTab=/data/atlas.service.keytab
    169. atlas.jaas.Client.option.principal=atlas/s1.hadoop.com@TEST.COM
    170. atlas.jaas.producer-1.loginModuleName=com.sun.security.auth.module.Krb5LoginModule
    171. atlas.jaas.producer-1.loginModuleControlFlag=required
    172. atlas.jaas.producer-1.option.useKeyTab=true
    173. atlas.jaas.producer-1.option.storeKey=true
    174. atlas.jaas.producer-1.option.keyTab=/data/atlas.service.keytab
    175. atlas.jaas.producer-1.option.principal=atlas/s1.hadoop.com@TEST.COM
    176. atlas.jaas.producer-1.option.security.protocol=SASL_PLAINTEXT
    177. atlas.jaas.producer-1.option.sasl.mechanism=GSSAPI
    178. atlas.jaas.producer-1.option.kerberos.service.name=kafka
    179. ######### Server Properties #########
    180. atlas.rest.address=http://localhost:21000
    181. # If enabled and set to true, this will run setup steps when the server starts
    182. #atlas.server.run.setup.on.start=false
    183. ######### Entity Audit Configs #########
    184. atlas.audit.hbase.tablename=apache_atlas_entity_audit
    185. atlas.audit.zookeeper.session.timeout.ms=1000
    186. atlas.audit.hbase.zookeeper.quorum=S0:2181,S1:2181,S2:2181
    187. ######### High Availability Configuration ########
    188. atlas.server.ha.enabled=false
    189. #### Enabled the configs below as per need if HA is enabled #####
    190. #atlas.server.ids=id1
    191. #atlas.server.address.id1=localhost:21000
    192. #atlas.server.ha.zookeeper.connect=localhost:2181
    193. #atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
    194. #atlas.server.ha.zookeeper.num.retries=3
    195. #atlas.server.ha.zookeeper.session.timeout.ms=20000
    196. ## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
    197. #atlas.server.ha.zookeeper.acl=<scheme>:<id>
    198. #atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>
    199. ######### Atlas Authorization #########
    200. atlas.authorizer.impl=simple
    201. atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json
    202. ######### Type Cache Implementation ########
    203. # A type cache class which implements
    204. # org.apache.atlas.typesystem.types.cache.TypeCache.
    205. # The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
    206. #atlas.TypeCache.impl=
    207. ######### Performance Configs #########
    208. #atlas.graph.storage.lock.retries=10
    209. #atlas.graph.storage.cache.db-cache-time=120000
    210. ######### CSRF Configs #########
    211. atlas.rest-csrf.enabled=true
    212. atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
    213. atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
    214. atlas.rest-csrf.custom-header=X-XSRF-HEADER
    215. ############ KNOX Configs ################
    216. #atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
    217. #atlas.sso.knox.enabled=true
    218. #atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
    219. #atlas.sso.knox.publicKey=
    220. ############ Atlas Metric/Stats configs ################
    221. # Format: atlas.metric.query.<key>.<name>
    222. atlas.metric.query.cache.ttlInSecs=900
    223. #atlas.metric.query.general.typeCount=
    224. #atlas.metric.query.general.typeUnusedCount=
    225. #atlas.metric.query.general.entityCount=
    226. #atlas.metric.query.general.tagCount=
    227. #atlas.metric.query.general.entityDeleted=
    228. #
    229. #atlas.metric.query.entity.typeEntities=
    230. #atlas.metric.query.entity.entityTagged=
    231. #
    232. #atlas.metric.query.tags.entityTags=
    233. ######### Compiled Query Cache Configuration #########
    234. # The size of the compiled query cache. Older queries will be evicted from the cache
    235. # when we reach the capacity.
    236. #atlas.CompiledQueryCache.capacity=1000
    237. # Allows notifications when items are evicted from the compiled query
    238. # cache because it has become full. A warning will be issued when
    239. # the specified number of evictions have occurred. If the eviction
    240. # warning threshold <= 0, no eviction warnings will be issued.
    241. #atlas.CompiledQueryCache.evictionWarningThrottle=0
    242. ######### Full Text Search Configuration #########
    243. #Set to false to disable full text search.
    244. #atlas.search.fulltext.enable=true
    245. ######### Gremlin Search Configuration #########
    246. #Set to false to disable gremlin search.
    247. atlas.search.gremlin.enable=false
    248. ########## Add http headers ###########
    249. #atlas.headers.Access-Control-Allow-Origin=*
    250. #atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
    251. #atlas.headers.<headerName>=<headerValue>
    252. ######### UI Configuration ########
    253. atlas.ui.default.version=v1

    要改的配置很多。。务必仔细核对。很多默认配置都是有问题的,keytab 新建或者复用都可以,担心可能会涉及到权限问题所以我选择了hive的账户。hbase中应该也需要配置相应的权限。没测试过是否需要配置

    kafka配置中的producer-1是默认的group id名称  多出来的三个配置是kafka在kerberos下procedure需要配置的参数

    修改atlas-env.sh

    1. #!/usr/bin/env bash
    2. # The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
    3. export JAVA_HOME=/usr/java/default
    4. export HBASE_CONF_DIR=/etc/hbase/conf
    5. # any additional java opts you want to set. This will apply to both client and server operations
    6. #export ATLAS_OPTS=
    7. # any additional java opts that you want to set for client only
    8. #export ATLAS_CLIENT_OPTS=
    9. # java heap size we want to set for the client. Default is 1024MB
    10. #export ATLAS_CLIENT_HEAP=
    11. # any additional opts you want to set for atlas service.
    12. #export ATLAS_SERVER_OPTS=
    13. # indicative values for large number of metadata entities (equal or more than 10,000s)
    14. export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps -Djava.security.krb5.conf=/etc/krb5.conf
    15. -Djava.security.auth.login.config=/data/atlas2.2/conf/jaas.conf"
    16. # java heap size we want to set for the atlas server. Default is 1024MB
    17. #export ATLAS_SERVER_HEAP=
    18. # indicative values for large number of metadata entities (equal or more than 10,000s) for JDK 8
    19. export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m"
    20. # What is is considered as atlas home dir. Default is the base locaion of the installed software
    21. export ATLAS_HOME_DIR=/opt/atlas2.2
    22. # Where log files are stored. Defatult is logs directory under the base install location
    23. #export ATLAS_LOG_DIR=
    24. # Where pid files are stored. Defatult is logs directory under the base install location
    25. #export ATLAS_PID_DIR=
    26. # where the atlas titan db data is stored. Defatult is logs/data directory under the base install location
    27. #export ATLAS_DATA_DIR=
    28. # Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
    29. #export ATLAS_EXPANDED_WEBAPP_DIR=
    30. # indicates whether or not a local instance of HBase should be started for Atlas
    31. export MANAGE_LOCAL_HBASE=false
    32. # indicates whether or not a local instance of Solr should be started for Atlas
    33. export MANAGE_LOCAL_SOLR=false
    34. # indicates whether or not cassandra is the embedded backend for Atlas
    35. export MANAGE_EMBEDDED_CASSANDRA=false
    36. # indicates whether or not a local instance of Elasticsearch should be started for Atlas
    37. export MANAGE_LOCAL_ELASTICSEARCH=false

     env中的jaas.conf 需要增加一个jaas.conf

    1. Client {
    2. com.sun.security.auth.module.Krb5LoginModule required
    3. useKeyTab=true
    4. KeyTab="/data/atlas.service.keytab"
    5. storeKey=true
    6. principal="atlas/s1.hadoop.com@TEST.COM"
    7. debug=false;
    8. };

     集成hive

    首先去CDH的hive上添加3处配置

    HiveServer2 的 Java 配置选项   {{JAVA_GC_ARGS}} -Datlas.conf=/data/apache-atlas-2.2.0/conf/

     

     hive-site.xml的HiveServer2 高级配置代码段 (安全阀)

    名称: hive.exec.post.hooks
    值: org.apache.atlas.hive.hook.HiveHook

    HiveServer2 环境高级配置片段(安全阀)

    HIVE_AUX_JARS_PATH=/data/apache-atlas-2.2.0/hook/hive/

     复制一份atlas-application.properties到/etc/hive/conf下 。注意需要修改

    1. 改为false
    2. atlas.authentication.method.kerberos=false
    3. 增加
    4. atlas.client.readTimeoutMSecs=90000
    5. atlas.client.connectTimeoutMSecs=90000
    6. 添加两个性能相关的参数
    7. #Atlas代码在提交事务时尝试获取锁(以确保一致性)的次数。
    8. #这应该与服务器预期支持的并发量有关。例如,在重试次数设置为10的情况下,多达100个线程可以在Atlas系统中同时创建类型。
    9. #如果将其设置为低值(默认值为3),则并发操作可能会失败,并出现PermanentLockingException。
    10. atlas.graph.storage.lock.reries=10
    11. #在逐出缓存项之前等待的毫秒数。这应该是atlas.graph.storage.lock.wait-time atlas.gaph.storage.lock.retrys
    12. #如果将其设置为较低的值(默认值为10000),Atlas应用程序日志中将出现有关耗时过长的事务的警告。
    13. atlas.graph.storage.cache.db-cache-time=120000

    最后两个配置的含义是读取连接时间,,默认的太短

    https://repository.cloudera.com/repository/libs-staging-local/org/apache/hive/hive-contrib/2.1.1-cdh6.3.4/hive-contrib-2.1.1-cdh6.3.4.jar

    下载这个jar包放到/opt/cloudera/parcels/CDH/lib/hive/lib下。导入hive数据时需要支持正则表达式的字段分隔方式 

    然后就可以启动了

    1. bin/atlas-start.py
    2. bin/atlas-stop.py

    启动过程如下图所示

    Atlas2.2.0编译、安装及使用(集成ElasticSearch,导入Hive数据)

    该过程会耗时较久,包含index创建、数据的初始化等操作,可能长达数小时,请耐心等待。
    此时可以跟一下atlas的启动日志,直到日志不再刷新,再lsof或netstat查一下21000是否已经监听了,如已存在,则打开浏览器输入ip:21000登录atlas页面

    千万不要相信他提示的Apache Atlas Server started!!!和jps显示的Atlas进程 ,因为启动脚本超过一定时间后一定会报成功,但此时21000端口还未被监听,服务是不可用的,真正可用还是以21000被成功监听,可以进到Atlas登录页面为准

    然后开始正式使用

    首先在/etc/profile中配置一下kafka和hive的信息

    1. export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
    2. export HIVE_CONF_DIR=/etc/hive/conf
    3. export PATH=$HIVE_HOME/bin:$PATH
    4. export KAFKA_HOME=/opt/cloudera/parcels/CDH/lib/kafka
    5. export KAFKA_CONF_DIR=/etc/kafka/conf
    6. export PATH=$KAFKA_HOME/bin:$PATH

    导入hive数据

    1. 记得kinit
    2. bin/import-hive.sh
    3. 也可以单独导入某个库
    4. bin/import-hive.sh -d default
    5. 或者单独导入某个表
    6. bin/import-hive.sh -d default -t test
    7. 导入一批库/表
    8. bin/import-hive.sh -f 文件名

     过程中会提示输入atlas用户名和密码,都输入admin即可
    成功后会提示

    Atlas2.2.0编译、安装及使用(集成ElasticSearch,导入Hive数据)

    该过程时间视hive现有数据量大小而定

    登录后如下图

    Atlas2.2.0编译、安装及使用(集成ElasticSearch,导入Hive数据)

    此时可以点击右上角小图标

    Atlas2.2.0编译、安装及使用(集成ElasticSearch,导入Hive数据)

    查看总体数据情况
     

    Atlas2.2.0编译、安装及使用(集成ElasticSearch,导入Hive数据)


    查看所有hive表
     

    Atlas2.2.0编译、安装及使用(集成ElasticSearch,导入Hive数据)


    随便点击一个表查看详情

    Atlas2.2.0编译、安装及使用(集成ElasticSearch,导入Hive数据)

    Atlas2.2.0编译、安装及使用(集成ElasticSearch,导入Hive数据)

    Atlas2.2.0编译、安装及使用(集成ElasticSearch,导入Hive数据)

    可以清楚地看到这个表的各项信息、字段及血缘图等

    我们也可以通过左侧搜索栏检索过滤想要查找的项

    Atlas2.2.0编译、安装及使用(集成ElasticSearch,导入Hive数据)

    有几个点需要注意:

    1、Atlas 是根据 Hive 所执行的 SQL 语句获取表与表之间以及字段与字段之间的依赖关系的,例如执行 insert into table_a select * from table_b 语句,Atlas 就能获取 table_a 与 table_b 之间的依赖关系。如果并未执行任何 SQL语句,故还不能出现血缘依赖关系。所以初次导入是没有血缘数据的

    2、 两个表之间必须要有真实的数据落地,才能解析出来表的血缘关系。例如:查询一条在a表中不存在的数据,然后插入b表。这样就无法解析ab表的血缘关系

    3、如果sql中用到的表,使用 with as...当做临时表。这样Atlas也无法解析血缘关系。
    4、在atlas中实体是独立的,因此当有业务变更涉及增删字段时,删除某字段实体会在相应表实体columns属性中移除,但表实体的Audits栏中并不会新增一条更新操作记录(可能是使用的版本存在bug);
    当新增字段实体时,仅且需要创建该字段实体即可,无需重新建表,我们可以观察到新增该字段实体的同时,表实体的columns的属性中新增了该字段,Audits栏中新增了一条更新操作记录。
    当创建atlas中已存在的实体时,如果该实体所有属性均未发生改变,那么在atlas中不会看到任何变化,实体的Audits栏中也不会新增一条创建或更新的操作记录;若有部分属性发生变化,则会对该实体进行更新,可以观察到发生变化的属性,而且Audits栏中新增了一条更新操作记录。
     


     

  • 相关阅读:
    Trie思想及模板
    数据通信——应用层(文件传输FTP)
    java毕业生设计高校毕业生就业满意度调查统计系统计算机源码+系统+mysql+调试部署+lw
    前端周刊第十七期
    Sub-1G射频收发器soc芯片 UM2080F32 低功耗 32 位 IoTP
    Spring学习(6) Spring中基于注解的方式管理Bean
    C语言-控制语句
    Anaconda安装配置
    Vue3理解(7)
    No mapping for GET /swagger-ui.html的解决方法
  • 原文地址:https://blog.csdn.net/h952520296/article/details/133774106