<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>诸葛子房的博客</title>
        <link>https://zgzf.online/</link>
        <description>诸葛子房的博客是一个大数据博客，内容包含：大数据、用户画像、Flink等知识，致力于分享大数据、用户画像相关知识</description>
        <lastBuildDate>Sun, 25 May 2025 14:01:06 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>zh-CN</language>
        <copyright>All rights reserved 2025, 诸葛子房</copyright>
        <item>
            <title><![CDATA[AI工具使用指南]]></title>
            <link>https://zgzf.online/article/d19df8d8-967b-4769-ba73-62f1dfee498b</link>
            <guid>https://zgzf.online/article/d19df8d8-967b-4769-ba73-62f1dfee498b</guid>
            <pubDate>Wed, 14 Feb 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-d19df8d8967b4769ba7362f1dfee498b"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-cb12b6128a0145bf840a9fc9fe0da1e1"><b>AI工具使用指南</b></div><div class="notion-blank notion-block-2d8bc8916a35409594e8dc806a5eeb98"> </div><div class="notion-text notion-block-0023af1c45a14b6d9b39fcf969eef4af"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="http://ai-timeline.top/">AI 时间线</a></div><div class="notion-blank notion-block-c9484cc341374c77950120251773fc63"> </div><div class="notion-text notion-block-d89a06dabc934b61ace76c2d99b3ae6d">kimi ai：擅长长文本解读</div><div class="notion-text notion-block-6d59ee754ed64bb28082e551cfc22b75">https://kimi.moonshot.cn/</div><div class="notion-text notion-block-07c012fc225a49c6a5fda4d629f3c2af"><code class="notion-inline-code">文章链接：[url]
1.元数据:标题，作者，链接，标题
2.作者主张，亮点;
3.逐层加深理解;
4.关键术语/概念;
5.文章内的无用信息;
6.摘要核心信息;
7.金句;
8.总结;</code>```</div><div class="notion-blank notion-block-cd4c33dd50964c4ba091b8817abb7dc2"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[用户画像系列——在线服务调优实践]]></title>
            <link>https://zgzf.online/article/8e6decb3-8b74-4db7-b41c-f3dad3b47a0b</link>
            <guid>https://zgzf.online/article/8e6decb3-8b74-4db7-b41c-f3dad3b47a0b</guid>
            <pubDate>Wed, 24 Jan 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-8e6decb38b744db7b41cf3dad3b47a0b"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-b4e1696890e044528ab7ff9132138ff4">前面文章讲到画像的应用的几个方面，其中画像的在线服务应用主要是在推荐场景、策略引擎场景，这两部分场景都是面向线上的c端服务。</div><div class="notion-text notion-block-861f0266365d4051882fa51f0f023567">推荐场景：根据不同的用户推荐不同的内容，做到个性化推荐，需要读取画像的一些偏好数据，推荐感兴趣的内容。</div><div class="notion-text notion-block-341556f5ea2b44dfb4a40788af0d76c0">策略引擎：根据用户的属性进入到不同的页面或者给出不同的策略，比如：普通用户访问不了淘宝的奢侈品入口，北京的活动只能北京用户参加。</div><div class="notion-text notion-block-9000a710929844f383b00f0f2570db2b">所以能看到画像的在线服务的业务要求，流量大、对于耗时敏感（上万或者几十万的QPS、要求在毫秒内返回结果）。</div><div class="notion-text notion-block-0bc8e6c3a86a4a91b670aac8e85c635c">目前业界对于这种c端大流量的服务基本上是采用Redis对数据进行存储，提供对外访问。</div><div class="notion-text notion-block-30059eda094b427f94a82698d5c23080">下面是画像服务在实际线上遇到的一些问题以及问题定位和处理思路：</div><div class="notion-text notion-block-43ff8bc4c410430680ca0e687ec9a565">(1)遇到的问题——流量高峰期耗时波动有毛刺、full gc 过于频繁</div><div class="notion-text notion-block-5a58e69ad3254f878dc4305667a0155f">流量高峰期，经常出现耗时波动，观察gc情况，发现GC过于频繁(2天左右一次full gc)</div><div class="notion-text notion-block-640d8bf94d314320aff9201a4d2252df">机器配置:4c 8g</div><div class="notion-text notion-block-4ff6e1a588d0482583e4f5da3867b4c7">jvm参数配置：-Xms6g -Xmx6g -XX:NewRatio=2 -XX:+UseParallelGC -XX:ParallelGCThreads=4 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:GCLogFileSize=50M -XX:NumberOfGCLogFiles=10 -XX:+UseGCLogFileRotation -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/online-server/logs -Xloggc:/data/online-server/logs/gc.log</div><div class="notion-text notion-block-779db75b30864bfeb2b5c7fdb4fef63a">单台容器 qps高峰在 600-800</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-c314cab0800f44228ecfb34adb77e1df"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F76b940b3-079c-4540-ae17-9e65405d1a58%2FUntitled.png?table=block&amp;id=c314cab0-800f-4422-8ecf-b34adb77e1df" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-8ca45d65685843ca8ba8ad38ed14d845">(2)优化方案一</div><div class="notion-text notion-block-45c1107904e7495293a8ae8a3a6449f9">机器配置:8c 16g</div><div class="notion-text notion-block-295410c48611404785c33314ebb357ce">jvm参数:-Xms14g -Xmx14g -XX:NewRatio=2 -XX:+UseParallelGC -XX:ParallelGCThreads=4 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:GCLogFileSize=50M -XX:NumberOfGCLogFiles=10 -XX:+UseGCLogFileRotation -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/online-server/logs -Xloggc:/data/online-server/logs/gc.log</div><div class="notion-text notion-block-6ce576b767a14c12b7e1d09a5d44590e">Full gc:4天一次</div><div class="notion-text notion-block-112fb10ca4d54ba28163d0d8354e65c0">但是Survivor 区非常小：只有13M 原因参考 <a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://zhuanlan.zhihu.com/p/148604647">https://zhuanlan.zhihu.com/p/148604647</a></div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-76b6a33be90349eea0928fd970af03ff"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F944270d3-c120-40b8-b2d1-50dad33b5635%2FUntitled.png?table=block&amp;id=76b6a33b-e903-49ee-a092-8fd970af03ff" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-aaac1a897f114d58b6736e1c7f5fa9da"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F26ec7d33-ec7c-47a6-a517-9bb63593a9db%2FUntitled.png?table=block&amp;id=aaac1a89-7f11-4d58-b673-6e1c7f5fa9da" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-069062fd001345e4baac56f9196d2033">(3)优化方案二</div><div class="notion-text notion-block-b65cc13f4110467a8353d36bd39c62ca">机器配置：8c 16g</div><div class="notion-text notion-block-5aa78d2969384a7e834da50f8f9ce915">jvm参数：-Xms14g -Xmx14g -XX:NewRatio=2 -XX:+UseConcMarkSweepGC -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:GCLogFileSize=50M -XX:NumberOfGCLogFiles=10 -XX:+UseGCLogFileRotation -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/online-server/logs -Xloggc:/data/online-server/logs/gc.log</div><div class="notion-text notion-block-69a507601b9e401cb6295c57db94b2ac">Survivor ：478M</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-d06b2b18d408444dabe7d81eb092553f"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2Fa9be5b3d-7a93-40e1-80a1-2686147af47e%2FUntitled.png?table=block&amp;id=d06b2b18-d408-444d-abe7-d81eb092553f" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-e82ca0a193fd461795a1d36ed046b843"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2Fa2cbbe0b-677e-476f-a8ee-4c13bc6d0b07%2FUntitled.png?table=block&amp;id=e82ca0a1-93fd-4617-95a1-d36ed046b843" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-f8a0c16472dc47bb8cb75bff05f81bba">(4)优化方案三</div><div class="notion-text notion-block-3b38b2adaec644069472da0c37f5f541">机器配置：8c 16g</div><div class="notion-text notion-block-cafe14eec84d430ab1c415695fb5dbf4">jvm：-Xms14g -Xmx14g -XX:NewRatio=1 -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:GCLogFileSize=50M -XX:NumberOfGCLogFiles=10 -XX:+UseGCLogFileRotation -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/online-server/logs -Xloggc:/data/online-server/logs/gc.log</div><div class="notion-text notion-block-36d52d79b78a4911ae24d932ee12ed7d">提升新生代大小，主要是查询服务的数据中含有很大的对象数据，短暂使用即可回收，不会常驻。</div><div class="notion-text notion-block-e8204612f86b4e0eacc5369de24f66d3">最终优化之后，full gc 维持在1周一次，但是仍然有接口耗时毛刺。</div><div class="notion-text notion-block-3410fb3161254b88b688b82599ddd016">第二步：利用Arthas接口优化实践</div><div class="notion-text notion-block-6bfac0061f3c4f3ab10a257d687e8aba">核心逻辑主要是从Redis中里面读数据+同时根据权限解析响应有权限的标签返回，下面是利用Arthas对接口的好是分析和较高耗时的时候传递的参数的分析</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-ef22a3cb75c64752984affb3cf976af7"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F2faefbcf-c201-4ef2-b642-0e80f4ea2e03%2FUntitled.png?table=block&amp;id=ef22a3cb-75c6-4752-984a-ffb3cf976af7" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-cdeb6f20d21f4e2180bbe6798e5e4009"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F44439fa0-487c-4219-8703-418f2326d411%2FUntitled.png?table=block&amp;id=cdeb6f20-d21f-4e21-80bb-e6798e5e4009" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-dba0aa54773a4aaba517f5c7e489244a"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2Fbbe13b96-387a-43d5-bd84-fab3968e3ed7%2FUntitled.png?table=block&amp;id=dba0aa54-773a-4aab-a517-f5c7e489244a" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-4c00bb2f64e94855928a98a7ec957287"><b>最终查看这部分耗时较高的id主要是value值数据量非常庞大，导致从Redis读取+解析耗时非常严重甚至达到秒级，虽然最终返回的结果数据不算太大，但是读取和解析耗时非常严重</b></div><div class="notion-text notion-block-ddc66e7dd162412c81f0b2adc16afe0b">从调优来看，虽然能通过增大机器资源4c 8g——8c 16g，同时通过调整jvm参数让full gc 能够达到一周一次，但是对于接口波动还是存在问题，主要原因就是某些id对应的value值较大，所以读取和解析耗时严重，因此最终方案应该考虑去对value进行拆分存储，避免一次性取出来过大的数据，将常用数据和非常用数据进行拆分。</div><div class="notion-text notion-block-c718ee7b30b74f22a5f87303cfbb6b1c">本文分析通过调整jvm参数以及利用Arthas进行分析接口耗时情况来进行定位在线服务问题。
博客地址：<a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://zgzf.online/">https://zgzf.online/</a></div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[用户画像系列——布隆过滤器在策略引擎中的应用]]></title>
            <link>https://zgzf.online/article/aef94a81-b024-4846-a795-206f073c811a</link>
            <guid>https://zgzf.online/article/aef94a81-b024-4846-a795-206f073c811a</guid>
            <pubDate>Mon, 22 Jan 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-aef94a81b0244846a795206f073c811a"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-ae2ac0c962a345d59705021df4711d0f"></div><div class="notion-text notion-block-2e25dae262b24c758c04072a2ffec8de">在用户画像系列——当我们聊用户画像，我们在聊什么？</div><div class="notion-text notion-block-a51e66bf6a3445e8ab4c44a7090d1a1b">介绍了用户画像的应用场景:</div><div class="notion-text notion-block-9a42a6cd3409442da87c73154ab9e2ce">(1)个性化推荐</div><div class="notion-text notion-block-7485afac664240a2aee2e04b10740033">通过用户标签给用户推荐合适的商品或者内容</div><div class="notion-text notion-block-141f0a9281a04d188a6992da49723b32">(2)营销圈选</div><div class="notion-text notion-block-973f1e2379a0475b8d1e4efc9610c52f">参考：用户画像系列——Lookalike在营销圈选扩量中的应用</div><div class="notion-text notion-block-3de8e51b9b0e4c61a548b02d0f976bb1">(3)策略引擎</div><div class="notion-text notion-block-d2385e20790f41f4bf856632a9f7463c">根据用户标签命中不同的策略，比如说：高消费人员有奢侈品入口通道</div><div class="notion-text notion-block-46c21adc5bc24a59850a194e1d69d713">(4)算法模型</div><div class="notion-text notion-block-43160b18771d413d9061124d401cada0">(5)画像报告</div><div class="notion-text notion-block-1f92b35b5aaa44a8a8923fda64cf31ac"><b>背景：</b></div><div class="notion-text notion-block-3f3ff575c00646208b8819d8bf21e74d">这篇文章分享的是用户画像在策略引擎中的应用。首先来了解下策略引擎是干什么的？</div><div class="notion-text notion-block-86752ce9a2cc4f438a6d85afac3a8d62">比如：当某个用户第一次进入某个平台，会给你弹窗有什么优惠，新用户优惠券等等；或者某个用户消费金额极高，达到一个阈值，认为该用户有极大的消费能力，可以开放奢侈品入口，即：淘宝上的奢侈品频道是根据之前有消费过奢侈品、且高价值的用户才开放的。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-6d5637e9143d4a119c968557c5602337"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F99d004f3-0d6b-47d6-9320-c0fa10e8f542%2FUntitled.png?table=block&amp;id=6d5637e9-143d-4a11-9c96-8557c5602337" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-96fd39c1593a4c2cb9f0f0fec00c554c">上述是一个简单的策略引擎，通过进行判断当前进入的用户是否满足某个或者多个条件然后进行相应的营销策略</div><div class="notion-text notion-block-90140106a3fc49c7b6f2dca5b4604c21"><b>问题和思路：</b></div><div class="notion-text notion-block-fef24ed186b54d23a8f36f70ea408795">当引擎策略(或者称之为规则)过于复杂，同时还会经常对规则进行变更，那么是否有一个更加简单的方法来处理呢？即是这篇文章要要分享的通过人群圈选功能，圈出目标用户，定义为人群包，然后当前用户进行平台判断该用户是否在这个人群包里面，因为是线上服务，要求进行用户去查询人群包匹配是否存在的过程需要在毫秒内返回结果</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-976d624951ec4efca310147a60520147"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:703px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F7ee6577e-ed7f-48df-b7e6-be47e9f8f605%2FUntitled.png?table=block&amp;id=976d6249-51ec-4efc-a310-147a60520147" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-e3ce1d6068734c80a7955072f5ba4d2c">通过上述方案，只需要修改圈选条件，就能够非常简单的解决，策略条件过于复杂以及经常调整的问题。</div><div class="notion-text notion-block-efa5a678b8af43c5aedfb4daaf206601">线上实际应用以及遇到的问题：</div><div class="notion-text notion-block-206a6e83048e425ab392be3df5ff5861">因为要求线上用户访问去人群包进行匹配的过程，需要在毫秒级进行返回，因此需要对于人群包数据进行存储，考虑采用redis 进行存储。</div><div class="notion-text notion-block-97a18d14910a4911b3c2e832bdb920ae">redis数据结构如下：</div><div class="notion-text notion-block-baa1a2be3bb84317b125376e9c370901">key：人群包id key_1，value：圈选出来的用户列表[userId_1,userId_2,userId_3等等]</div><div class="notion-text notion-block-2803b1d47bd84d6b8bb96f708e765e05">但是随着人群包越来越多，需要的存储也越来越多，通常一个人群包里面的用户少则几万多则上千万甚至亿级。</div><div class="notion-text notion-block-38b0ec72d7af40ca921e8b36d77ce23c"><b>优化方案：</b></div><div class="notion-text notion-block-01e5d050912942d1b16a34df88ec70f0">通过使用redis 的布隆过滤器</div><div class="notion-text notion-block-46cce52e43c7499bae7f6d5345e5d99d">由于布隆过滤器存储空间小，并且节省空间，不存储数据本身，仅存储hash结果取模运算后的位标记，整理存储优化极大相对于散列表</div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[用户画像系列——推荐相关核心标签(偏好类)]]></title>
            <link>https://zgzf.online/article/3c97e4d9-3598-4616-b816-b964bd5ae9ad</link>
            <guid>https://zgzf.online/article/3c97e4d9-3598-4616-b816-b964bd5ae9ad</guid>
            <pubDate>Tue, 16 Jan 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-3c97e4d935984616b816b964bd5ae9ad"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-fcc48d560660407fa0b0df27b4b0a55d"><b>一、背景</b></div><div class="notion-text notion-block-11b7552319044024984f3b563162b22c">我们经常在逛购物网站或者刷抖音、听网易云音乐的时候，会有猜你喜欢或者为你推荐这样一个功能，而这依赖的就是用户画像的偏好类标签：比如说明星偏好(喜欢某个明星或者歌手的作品)、类型偏好(比如说：喜欢美妆类、喜欢美食类)</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-3e59b1ee3cc74888939ea95077709a43"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-1bb702eb6d8042198a8f2b69dcb1193e"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:590px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F4bb1baab-2c25-48a3-8de8-c8fcd143123f%2FUntitled.png?table=block&amp;id=1bb702eb-6d80-4219-8a8f-2b69dcb1193e" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-ba32cffcdbe34af2b18b62e40f99a8ce"></div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-d5a1129247f14d86b7753f1c3da04c26"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:610px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F3ccb224c-2e21-4778-8c71-422f1e3bb130%2FUntitled.png?table=block&amp;id=d5a11292-47f1-4d86-b775-3f1c3da04c26" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-7a60808e09c94ec0bd219d8bf356417c"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-53320b6697da45988f3840a45e5f5830"><b>二、偏好标签加工的核心逻辑</b></div><div class="notion-text notion-block-afd36061fcb34e7a9aa6c7580338f965">偏好类标签一般都是以用户的行为日志进行加工。</div><div class="notion-text notion-block-14dad6b14ecc4daca1b894dd110c1bd2">比如说：视频类软件(观影日志、评论日志、点赞日志、收藏日志)——内容偏好、类型偏好、明星偏好，电商类(订单日志、浏览日志、收藏日志、加购物日志)——商品偏好、价格偏好、品牌偏好</div><div class="notion-text notion-block-4c34168bec5b48b8ad783976dfb5ed40">下面我们以视频类软件为例来详细讲解下偏好类标签的加工逻辑</div><div class="notion-text notion-block-45f13fe0f780457a86e842d319df8f68"><b>1.偏好类事实型：主要是根据用户观影数据来计算用户在某个内容或者类型下的观影总时长和最后一次观影时间</b></div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-be66688f6e6f48b8bf3839eb97c8c296"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2Fdf45c541-60ba-41bc-bee6-466eb6f7e80a%2FUntitled.png?table=block&amp;id=be66688f-6e6f-48b8-bf38-39eb97c8c296" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-6ec369c0b0254efebcf1199e9bc227fb"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-83aeeb35498d4aa1b24338c168189d9e"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-fe9babe4799742789023e0a1dbc7f1a0"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2Fbfde8827-2a22-4727-abf7-64a0d87c25eb%2FUntitled.png?table=block&amp;id=fe9babe4-7997-4278-9023-e0a1dbc7f1a0" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-fe3bd089a4544448a069e3507fdeaffd">上面是一张播放行为表，涵盖了基本的播放行为数据，和一张为tag维表(一个电视剧或者电影会打上非常多的标签，表中只是罗列了3个)</div><div class="notion-text notion-block-7072190ae77542a2a994dfab2ccd1aac">根据上述两张表可以生成如下用户观影tag表，能看到一个电视剧或者电影能很明显的</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-71f13bbf95544fc58fd45f3c009cbef6"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F5f625040-a2df-4df7-9985-0b777b16c032%2FUntitled.png?table=block&amp;id=71f13bbf-9554-4fc5-8fd4-5f3c009cbef6" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-ffa5e4cf07ed47ff9f4fad64391c8264"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-c01e3f81f4494aa2b2f33f9aa3915074">根据上述表进行计算得到每个用户在每个标签下的观影总时长，该标签下最后一次观影时间</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-a4fdbdb2a7d04d60aac5abbe3d3994d9"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-459d9f81a0284fd9a3f015d39628c15f">这样每个用户的内容偏好标签即可计算完成</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-80ec588ab99843d29c5a32a2a2700397"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F22dbea63-8cf8-46f2-90ed-ad64965a5957%2FUntitled.png?table=block&amp;id=80ec588a-b998-43d2-9c5a-32a2a2700397" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-ce05f69b64fc406393bd44809552cca7"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-213d968778c1456c8975b2bd4047a922">注：当我们想看用户最近半年的观影偏好时，就用用户最近半年的观影数据来进行计算即可，这样就能动态产出用户观影偏好</div><div class="notion-text notion-block-f4b9e37d22c7483b9ce1e9e7e7a95817">细心的读者发现打在某个tag上有观影总时长和最后一次观影时间，观影总时长能看出这个用户对于这种tag的内容非常感兴趣，而最后一次观影时间说明最近用户在观看这部分内容。</div><div class="notion-text notion-block-9ed626bc5dcc439ab25f3281de79ab6b">相当于总时长代表的是一个长期兴趣，而最后一次观影代表的是用户的一个短期兴趣。比如说：用户男性经常观看古装或者美女，但是最近有一个毕竟火的热点视频，该用户也在观看，但这种只能代表用户最近一段时间的兴趣偏好，当过了这段时间就应该选用长期兴趣标签了。</div><div class="notion-text notion-block-0a851011903c4b1d9bf36fe732afe925"><b>2.偏好类权重型：主要是根据用户观影数据来计算用户在某个内容或者类型下的观影权重，比如说权重越高说明用户对于某个内容更感兴</b></div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-4da8a30008f8481fae5809eb54c67230"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F832405ff-6546-4d2d-ae3f-1ea5d8be8623%2FUntitled.png?table=block&amp;id=4da8a300-08f8-481f-ae58-09eb54c67230" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-fd4655d33f5f46c6aee691300e2da6fb"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-6c373ff8698b4019a34bfc05675520bf">用户观影权重表</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-5f3922e1579a429dbdcc9ad78d635be2"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-fd4c7d46c76e48ccbb881beff364186c"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F6ecf430e-651e-4301-9b11-c8d0eceab7ea%2FUntitled.png?table=block&amp;id=fd4c7d46-c76e-48cc-bb88-1beff364186c" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-55873123cca443589f883b3fc655c1eb"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-12299aaf461c4a94aefa39e74770a392">注：权重公式系数非常有讲究，主要看是关注最后一次观影时间还是更加关注播放时长，如果更关注时长则时长权重更大，如果更加关注最后一次观音时间则此处系数应该更大。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-e14b3fa3722840a7a201502cb93b68f7"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2Fe9f64f27-e34c-4533-9b7e-101e29cc21db%2FUntitled.png?table=block&amp;id=e14b3fa3-7228-40a7-a201-502cb93b68f7" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-375b31a4c72f4c46b6163ba03d4fb128"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-943355c6f76a467688a8b6b9954df7ed">归一化：直接按照视频id打上标签然后相加权重值明显会超过1，因此需要归一化处理</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-3d2df7a35cea45c092cb0d306fdfdd59"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F754dbd30-d1cd-4294-a106-50bf9463a70a%2FUntitled.png?table=block&amp;id=3d2df7a3-5cea-45c0-92cb-0d306fdfdd59" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-ba24a5fcb1b94d0cae298afd8106f936"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-63beea6a512d4cd28da2c8ca7c9f2dee"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:15px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-4a9ffc2590ae4e8c917faad8fd4115a2"><b>三、总结</b></div><div class="notion-text notion-block-3b1c3c94e69f47eb93ee251fec2b06fc">至此即完成了偏好类标签的处理和加工，应用的话，可以在广告、推荐等多个场景进行应用。比如说：某个用户经常看搞笑视频可以给它推荐搞笑的玩具或者视频等等</div><div class="notion-text notion-block-838f5ca592414ecfa11fdd0352a5516a"></div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[用户画像系列——数据中台之OneID (ID-Mapping)核心架构设计]]></title>
            <link>https://zgzf.online/article/bb68a83f-521a-4721-ab9e-5a7e55774c90</link>
            <guid>https://zgzf.online/article/bb68a83f-521a-4721-ab9e-5a7e55774c90</guid>
            <pubDate>Mon, 15 Jan 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-bb68a83f521a4721ab9e5a7e55774c90"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-1b469c23981740fca02d6da85d5b39d5"><b>一.引言</b>
大家在上网的过程中是不是经常有这样的体验，我在百度(或者京东、淘宝)上搜索一件商品(比如说：我搜索了一台iphone 手机看了看，但是没买)，奇怪的是过两天，我竟然在某视频平台或者某网页上又看到了它？</div><div class="notion-text notion-block-147c6f9989974f91ab420b08e7d90114">而且更加奇怪的是，我明明是在电脑pc 端搜索的手机，但是我在手机上看电影的时候却能看到它，是不是也太奇怪了。</div><div class="notion-text notion-block-a2bbd2620b55473a8bc9e990331bf05b">难道我的电脑、我的手机、我的ipad 等等电子设备都被监控了吗？</div><div class="notion-text notion-block-16f3fbd9cb6e47cbac4090f758584bc6"><b>二、背景</b>
《阿里巴巴大数据之路》中有讲到关于数据中台OneData 的方法论，其中分别涉及到OneModel、OneService、OneID。OneService 在之前的文章中已经有过分享，参考：(1条消息) 某互联网大厂亿级大数据服务平台的建设和实践_诸葛子房的博客-CSDN博客</div><div class="notion-text notion-block-d534ccc29a4241b899e4e44723fa1871">引言中所提到的小故事其实就是本文着重要分享的关于OneID 的部分</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-4ddd41b689c6471289750296f919c33d"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2Fc5dce7b6-6a33-497e-a312-de9e576781d1%2FUntitled.png?table=block&amp;id=4ddd41b6-89c6-4712-8975-0296f919c33d" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-6f194d96e4a94d2caad0b42b0a22670a"><b>三、概念介绍</b></div><div class="notion-text notion-block-8a846fcc1c0d496a97646e44ff7b5135">OneID是指同一用户、同一设备的一个唯一ID。</div><div class="notion-text notion-block-db80916054944e5986cf05d17bf44baa">例子一：我用我自己的电脑搜索小米11，我又用这台电脑浏览其他网站，然后这个网站给我弹小米11的广告，对，也许你认为这种都是同一台电脑，非常简单，只需要根据ip等相关信息进行匹配就可以了。</div><div class="notion-text notion-block-e056aa8375ac45e9a7d6c82acb789e37">例子二: 我用我自己的电脑搜索小米11，然后用自己的苹果手机，去浏览其他网站，然后其他网站又给我弹了小米11的广告，这怎么识别的呀？</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-3175bf8edae84b508195093ede07d878"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2Fe82efde2-f47d-4074-b983-c49508e7570f%2FUntitled.png?table=block&amp;id=3175bf8e-dae8-4b50-8195-093ede07d878" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-1ed084ec880c4f5bbc665887907dc470">我们认为这台电脑、这部手机、以及其他的电子设备是可以构成一个唯一的ID，即：OneID</div><div class="notion-text notion-block-1dd1aedf85ca4b22904e33729b497a6b">但是你一定会问了，那他们是怎么关联上的呢？而不会关联错呢？比如说没有把我在电脑上搜索的小米11，推到我的手机上，而不是推到我女朋友的手机上</div><div class="notion-text notion-block-6e74282fde3b4063bbc2a4a7b8207f39">这个映射的过程，称之为：ID-Mapping</div><div class="notion-text notion-block-a17732084e1845bfb7956da2e14a7aec"><b>四、核心架构设计</b></div><div class="notion-text notion-block-d680a2d386ca4e5ebb131b6834a55227">1.业务逻辑</div><div class="notion-text notion-block-49c1b4bab6f74be684f777b3511f1743">当使用安卓手机安装app，启动时会上报 IMEI，Mac地址 ，AndroidID，手机型号等设备信息</div><div class="notion-text notion-block-1942b9c72d044310bb47457d25f13695">同理ios 会上报相关启动信息 idfa，手机设备等信息；pc 浏览器端会上报 cookie、浏览器相关信息。</div><div class="notion-text notion-block-244ccd17261f49b2bb44b47695fa610d">但是每一次上报的信息不一定完全相同，比如说手机权限问题，浏览器cookie 清空等问题。</div><div class="notion-text notion-block-26880e76ee764dd1b04e320631e0e436">LocalID：app 启动会根据AndroidID生成本地id(不需要连网)，记录为本地ID</div><div class="notion-text notion-block-5ddc6acc15084e9eaa3a74d27ff1b9cf">OneID：app 联网上报的设备相关信息，用来打通多个app以及多端应用的唯一ID</div><div class="notion-text notion-block-19dfc8bc494747a4bb2af0e2686e03d1">OneID的作用：</div><div class="notion-text notion-block-5d891a08e123493c99262b99a3e9a9bf">(1)打通手机上多个app，比如说：淘宝、支付宝、高德等等是同一个用户(未登录时，登录时用账号id)</div><div class="notion-text notion-block-6d62343c0fc642f293614282757ff64d">(2)打通多端应用，比如说你在手机端看剧，然后pc web 端能知道还是同一用户(解决账号共享无法定位同一用户的问题)</div><div class="notion-text notion-block-30f9d9ad33a647dfb219515f108183ff">2.生成流程解析</div><div class="notion-text notion-block-85b6fe90ae474065bc78ae426b5a3b1d">(1)HBase数据表结构设计</div><div class="notion-text notion-block-5deb4513b54a47368b3c57127fc377b0">安卓端表映射表结构(android_id_mapping)，其他端类似：</div><div class="notion-text notion-block-dd91746905ab4ca98a296bd53168748d">OneID   imei  mac_adress    android_id      oaid</div><div class="notion-text notion-block-30c7d907c547457e8b744b5e753db843">本地id 和 OneId映射表结构(local_id_mapping)：</div><div class="notion-text notion-block-07ebdb10c75a48aa8e9e8f362c58346c">LocalID   OneID</div><div class="notion-text notion-block-c41a27d8c8e3444bb081e6017f32954c">(2)为了应对高并发场景，将HBase 提前预热至Redis 缓存，redis 表设计</div><div class="notion-text notion-block-dd7334bf888d4a7cae577a0fc371f57d">imei_value                 oneid1,oneid2...</div><div class="notion-text notion-block-c798ed6cc28b4ec888b5c32ee16d2586">mac_address_value  oneid1,oneid2...</div><div class="notion-text notion-block-5b1efc91449744f497f3389cf975b5b6">(3)ID-Mapping 映射流程</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-bb1e2e8abc99444c9d75eaff8fe6970c"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2Fe8f06820-4397-4f9a-ba18-7ba9ea29436a%2FUntitled.png?table=block&amp;id=bb1e2e8a-bc99-444c-9d75-eaff8fe6970c" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-d30d224b8d2e44a0969f1bca5fe61857">票选服务：</div><div class="notion-text notion-block-f7a8ad237baa43738cff667e3d6a0a21">根据客户端上报的参数信息去redis 里面匹配OneID，核心权重设置</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-3dff424456c348c89cfdf7358190c5a3"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:706px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F97b32481-4f82-467e-8d55-f08edf7d3f89%2FUntitled.png?table=block&amp;id=3dff4244-56c3-48c8-9cfd-f7358190c5a3" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-fcb0840a455e4b6d8fdd314b97e0a1d5"><b>五、总结</b></div><div class="notion-text notion-block-d5614f56c7a44f60882c1fa1b35f4cc6">最后来回答下开头提出的两个问题</div><div class="notion-text notion-block-4bff578891604807b417cfa2b47bb372">问题一：我在百度(或者京东、淘宝)上搜索一件商品(比如说：我搜索了一台iphone 手机看了看，但是没买)，奇怪的是过两天，我竟然在某视频平台或者某网页上又看到了它？</div><div class="notion-text notion-block-870c43d9dd9841bc9b17796b9f989206">在视频平台看视频的时候，获取到浏览器的cookie等相关信息，通过调用百度的服务去匹配同一个设备的最新的搜索数据，然后进行广告投放，做到千人千面</div><div class="notion-text notion-block-d4a1603fb08840e9992a705fe75b53c8">问题二：而且更加奇怪的是，我明明是在电脑pc 端搜索的手机，但是我在手机上看电影的时候却能看到它，是不是也太奇怪了。</div><div class="notion-text notion-block-8fd4bbd3b9824bc0a022100a83141c82">手机端装了看电影的app，同时我在pc web 端之前也看在该网站有观影信息，结合账号信息+设备信息+观影信息，生成一个OneID，打通该视频网站，所有端的数据。</div><div class="notion-text notion-block-7bac86c33eb940da852078c505a3edf1">后续我在pc web 看电影取获取当前唯一id的所有端的信息，然后调用京东或者淘宝的广告服务，进行投放。</div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[用户画像系列—如何从0到1建设用户画像]]></title>
            <link>https://zgzf.online/article/18fb78ef-6e0f-4506-a5cc-b480b7d8e6fb</link>
            <guid>https://zgzf.online/article/18fb78ef-6e0f-4506-a5cc-b480b7d8e6fb</guid>
            <pubDate>Mon, 15 Jan 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-18fb78ef6e0f4506a5ccb480b7d8e6fb"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-b72b79df78264de49a067ff07d5a586a"><b>1.用户画像平台该如何建设？</b></div><div class="notion-text notion-block-64928ec67c594e31a55218ef42fb06ab">根据上一篇文章，我们讲到<a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://so.csdn.net/so/search?q=%E7%94%A8%E6%88%B7%E7%94%BB%E5%83%8F&amp;spm=1001.2101.3001.7020">用户画像</a>其实就是用户的标签或者特征，首先要明确就是要完成标签的生产和加工，那么涉及到的内容就包括数据的接入、清洗、和最后标签的加工入库。</div><div class="notion-text notion-block-67958d741e364945832b0abc0cca1a47">标签整体流程如下：</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-5b13f2b8a65c412abbe2c04a1db61134"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F7a60adc3-2f56-4378-b9f5-8de2990bd42f%2FUntitled.png?table=block&amp;id=5b13f2b8-a65c-412a-bbe2-c04a1db61134" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-8c684e0ee8914d4abdf82145ce615484">(1)线上日志数据接入和处理</div><div class="notion-text notion-block-756c8431906041b283023b37e4ddd380">数据分层</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-5836a243aad1445099d3fc81699ba745"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:604px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2Fef0a660f-81ce-4979-99de-e38369e0e7aa%2FUntitled.png?table=block&amp;id=5836a243-aad1-4450-99d3-fc81699ba745" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-92087b89da9d4c008bdbfe01a717dd4c">(2)标签的加工和挖掘</div><div class="notion-text notion-block-10e2c9bb604a4308aba114ae2b9044a6">a.标签加工根据标签的加工方式分为三类：事实类、统计类、算法类</div><div class="notion-text notion-block-e80108b5dd114e2d96b869983d4c41ca">事实类：主要是基于原始数据同步过来即可，比如：最后一次登录时间</div><div class="notion-text notion-block-fada13185fc24a16b3dd376f4154d222">统计类：在原始数据上做一些简单的统计规则，比如：最近一个月活跃天数</div><div class="notion-text notion-block-d5fc66100e874d3681bef2cad981ba64">算法类：根据用户的行为和交易信息利用算法挖掘出来，比如：工作位置、家庭位置(根据gps信息采用聚类算法挖掘出来)</div><div class="notion-text notion-block-49fdeddf7ca94a4bb447e04e7d0c5b9f">b.标签加工根据标签的时效性分为三类：离线(T+1)、准实时(T+H)、实时</div><div class="notion-text notion-block-75b6cf41f0f94d9c957d9b392ae52336">(3)标签存储和应用</div><div class="notion-text notion-block-527c73fcce184e42874758ff5c93dbd6">为了应对不同的应用场景，使用不同的数据库作为存储方案
</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-7ea7f475bef0489292568f52637f8bd7"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F32b42e23-11c7-4e34-a60b-e64890da159c%2FUntitled.png?table=block&amp;id=7ea7f475-bef0-4892-9256-8f52637f8bd7" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-941188125af44a1eab45e77192056eaa">(4)标签权限管控、标签字典、标签质量</div><div class="notion-text notion-block-c62fef964e5543b390480ae3e3c41029">标签权限管控：业务只能使用申请权限了的标签权限，标签权限配置存储在MySQL</div><div class="notion-text notion-block-a8532a28260a4f2ba854f048553e9401">标签字典：标签内容数据只存储字典枚举，而不实际存储实际内容(比如：性别标签男女存储为0、1)</div><div class="notion-text notion-block-1cfea7a35b2f4239bc4becb2d99656b7">标签质量：对于标签的数据质量进行监控、波动告警，包含：标签的覆盖率、标签分布的监控告警</div><div class="notion-blank notion-block-d4f76a3469a64d6bbd08c264117de11f"> </div><div class="notion-text notion-block-3073ae6a69c24ed6a5880f3ce8908fc8"><b>2.用户画像建设过程中会用到哪些技术？</b></div><div class="notion-text notion-block-0a606b0415f84635bbd3a5c4d5950afe">(1)大数据相关的一些技术</div><div class="notion-text notion-block-17e002f86ee3417281e151ae6f3f8bf5">Java、MySQL、Python、Hive、Spark、Flink、HBase</div><div class="notion-text notion-block-80ba4d761e5e495b93e84454e389a74d">(2)服务开发</div><div class="notion-text notion-block-acf7934d1a5e4878bce4741c01b4a0e8">rpc服务</div><div class="notion-text notion-block-fd792939e563438cb72f60a4a23be01c">(3)标签挖掘算法</div><div class="notion-text notion-block-1858ddf4501142fc9563f31861f684a4">聚类、逻辑回归等，Python、Spark</div><div class="notion-text notion-block-c70aba96c9cd479b8c4c722a6bd95024">3.用户画像建设过程中会遇到哪些问题？</div><div class="notion-text notion-block-5fe6a915cee1413c90837d3838f46f7f">(1)降本增效大环境下，用户画像侧如何做好存储和计算性能优化？</div><div class="notion-text notion-block-fdae7dbcdbf24954bd946a01e3daa69f">a.KV存储采用Protobuf存储，Protobuf编码性能好且压缩比高。因为画像的数据类型一般比较固定、单值或者多值，对序列化反序列化性能以及数据压缩效果有较高要求</div><div class="notion-text notion-block-3cd4523d2a9849a0b9185b4e495c2085">b.标签内容字典化</div><div class="notion-text notion-block-d1bd870dd8004211b4f455f6b52fe6ca">c.画像特征抽取自定义抽取，资源占用低</div><div class="notion-text notion-block-ad261ab08f774074b00aaf361dd40f64">目前特征抽取主要有单特征抽取和批量特征抽取</div><div class="notion-text notion-block-94d629321add4292a647790acb9e3c4e">单特征：优点，控制灵活。缺点，每个特征都会启动各自的拉取任务，执行效率低且耗费资源。</div><div class="notion-text notion-block-ebef55ec81bb48e3be571a36402e7281">批量特征抽取：成本可控，但较依赖上游Hive 表数据</div><div class="notion-text notion-block-8d85457364b14f72bbb84b98bc7ecfc6">因此考虑自定义特征抽取方案，根据标签优先级策略配置抽取策略，既能做到成本可控又能做到满足时效性。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-04f0442e1d3d4a458ebfa61f555df6f4"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F93c66975-004d-4776-aed5-f6964fff9546%2F38c4c940-c718-46f6-a29e-25acd502ca7d%2FUntitled.png?table=block&amp;id=04f0442e-1d3d-4a45-8ebf-a61f555df6f4" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-af9801255cbc4637ad045d7d362b5be5">d.冷热数据分级存储</div><div class="notion-text notion-block-b96f758195ca472db2ac38ea3ab68b80">热数据考虑用更好的硬件设备进行存储(SSD、独立集群等)、冷数据考虑用一般的硬件设备进行存储(HHD、公共集群)</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-c87513b96e2044f6b09bc2473a18f8da"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:700px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://img-blog.csdnimg.cn/7025d3d665104c2e874c85e0132b4c39.png" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-ec82484366de467f851cdf3c138b4d07"> </div></main></div>]]></content:encoded>
        </item>
    </channel>
</rss>