网站首页
JSP空间
动态资讯
开源项目
技术文档
资源下载
J2EE资源
客户论坛
在线支付
 
  技术文档>>数据库技术>>Oracle技术>>Oracle开发>查看文档  
  循序渐进讲解oracle数据库的hash join (1)     
  文章作者:未知  文章来源:赛迪网技术社区  
  查看:66次  录入:管理员--2008-04-30  
 

【赛迪网-it技术报道】在开发过程中,很多人经常会使用到hash map或者hash set这种数据结构,这种数据结构的特点就是插入和访问速度快。当向集合中加入一个对象时,会调用hash算法来获得hash code,然后根据hash code分配存放位置。访问的时,根据hashcode直接找到存放位置。

oracle hash join 是一种非常高效的join 算法,主要以cpu(hash计算)和内存空间(创建hash table)为代价获得最大的效率。hash join一般用于大表和小表之间的连接,我们将小表构建到内存中,称为hash cluster,大表称为probe表。

效率

hash join具有较高效率的两个原因:

1.hash 查询,根据映射关系来查询值,不需要遍历整个数据结构。

2.mem 访问速度是disk的万倍以上。

理想化的hash join的效率是接近对大表的单表选择扫描的。

首先我们来比较一下,几种join之间的效率,首先 optimizer会自动选择使用hash join。

注意到cost= 221

sql> select * from vendition t,customer b where t.customerid = b.customerid;

100000 rows selected.

execution plan

----------------------------------------------------------

plan hash value: 3402771356

--------------------------------------------------------------------------------

| id | operation | name | rows | bytes | cost (%cpu)| time |

--------------------------------------------------------------------------------

| 0 | select statement | | 106k| 22m| 221 (3)| 00:00:03 |

|* 1 | hash join | | 106k| 22m| 221 (3)| 00:00:03 |

| 2 | table access full| customer | 5000 | 424k| 9 (0)| 00:00:01 |

| 3 | table access full| vendition | 106k| 14m| 210 (2)| 00:00:03 |

--------------------------------------------------------------------------------

不使用hash,这时optimizer自动选择了merge join。。

注意到cost=3507大大的增加了。

sql> select /*+ use_merge (t b) */* from vendition t,customer b where t.customerid = b.customerid;

100000 rows selected.

execution plan

----------------------------------------------------------

plan hash value: 1076153206

-----------------------------------------------------------------------------------------

| id | operation | name | rows | bytes |tempspc| cost (%cpu)| time

-----------------------------------------------------------------------------------------

| 0 | select statement | | 106k| 22m| | 3507 (1)| 00:00:43 |

| 1 | merge join | | 106k| 22m| | 3507 (1)| 00:00:43 |

| 2 | sort join | | 5000 | 424k| | 10 (10)| 00:00:01 |

| 3 | table access full| customer | 5000 | 424k| | 9 (0)| 00:00:01 |

|* 4 | sort join | | 106k| 14m| 31m| 3496 (1)| 00:00:42 |

| 5 | table access full| vendition | 106k| 14m| | 210 (2)| 00:00:03 |

-----------------------------------------------------------------------------------------

那么nest loop呢,经过漫长的等待后,发现cost达到了惊人的828k,同时伴随3814337 consistent gets(由于没有建索引),可见在这个测试中,nest loop是最低效的。在给customerid建立唯一索引后,减低到106k,但仍然是内存join的上千倍。

sql> select /*+ use_nl(t b) */* from vendition t,customer b where t.customerid = b.customerid;

100000 rows selected.

execution plan

----------------------------------------------------------

plan hash value: 2015764663

--------------------------------------------------------------------------------

| id | operation | name | rows | bytes | cost (%cpu)| time |

--------------------------------------------------------------------------------

| 0 | select statement | | 106k| 22m| 828k (2)| 02:45:41 |

| 1 | nested loops | | 106k| 22m| 828k (2)| 02:45:41 |

| 2 | table access full| vendition | 106k| 14m| 210 (2)| 00:00:03 |

|* 3 | table access full| customer | 1 | 87 | 8 (0)| 00:00:01 |

hash的内部

hash_area_size在oracle 9i 和以前,都是影响hash join性能的一个重要的参数。但是在10g发生了一些变化。oracle不建议使用这个参数,除非你是在mts模式下。oracle建议采用自动pga管理(设置pga_aggregate_target和workarea_size_policy)来,替代使用这个参数。由于我的测试环境是mts环境,自动内存管理,所以我在这里只讨论mts下的hash join。

mts的pga中,只包含了一些栈空间信息,uga则包含在large pool中,那么实际类似hash,sort,merge等操作都是有large pool来分配空间,large pool同时也是auto管理的,它和sga_target有关。所以在这种条件下,内存的分配是很灵活。

hash连接根据内存分配的大小,可以有三种不同的效果:

1.optimal 内存完全足够

2.onepass 内存不能装载完小表

3.multipass workarea executions 内存严重不足

下面,分别测试小表为50行,500行和5000行,内存的分配情况(内存都能完全转载)。

vendition表 10w条记录

customer表 5000

customer_small 500,去customer表前500行建立

customer_pity 50,取customer表前50行建立

表的统计信息如下:

sql> select s.table_name,s.blocks,s.avg_space,s.num_rows,s.avg_row_len,s.empty_blocks from user_tables s where table_name in ('customer','vendition','customer_small','customer_pity') ;

table_name blocks avg_space num_rows avg_row_len empty_blocks

customer 35 1167 5000 38 5

customer_pity 4 6096 50 37 4

customer_small 6 1719 500 36 2

vendition 936 1021 100000 64 88打开10104事件追踪:(hash 连接追踪)

alter system set events ‘ 10104 trace name context,level 2’;

测试sql

select * from vendition a,customer b where a.customerid = b.customerid;

select * from vendition a,customer_small b where a.customerid = b.customerid;

select * from vendition a,customer_pity b where a.customerid = b.customerid;

小表50行时候的trace分析:

*** 2008-03-23 18:17:49.467

*** session id:(773.23969) 2008-03-23 18:17:49.467

kxhfinit(): enter

kxhfinit(): exit

*** rowsrcid: 1 hash join statistics (initialization) ***

join type: inner join

original hash-area size: 3883510

ps:hash area的大小,大约380k,本例中最大的表也不过250块左右,所以内存完全可以完全装载

memory for slot table: 2826240

calculated overhead for partitions and row/slot managers: 1057270

hash-join fanout: 8

number of partitions: 8

ps:hash 表数据连一个块都没装满,oracle仍然对数据进行了分区,这里和以前在一些文档上看到的,当内存不足时才会对数据分区的说法,发生了变化。

number of slots: 23

multiblock io: 15

block size(kb): 8

cluster (slot) size(kb): 120

ps:分区中全部行占有的cluster的size

minimum number of bytes per block: 8160

bit vector memory allocation(kb): 128

per partition bit vector length(kb): 16

maximum possible row length: 270

estimated build size (kb): 0

estimated build row length (includes overhead): 45

# immutable flags:

not buffer(execution) output of the join for pq

evaluate left input row vector

evaluate right input row vector

# mutable flags:

io sync

kxhfsetphase: phase=build

kxhfaddchunk: add chunk 0 (sz=32) to slot table

kxhfaddchunk: chunk 0 (lbs=0x2a97825c38, slottab=0x2a97825e00) successfuly added

kxhfsetphase: phase=probe_1

qerhjfetch: max build row length (mbl=44)

*** rowsrcid: 1 end of hash join build (phase 1) ***

revised row length: 45

revised build size: 2kb

kxhfresize(enter): resize to 12 slots (numalloc=8, max=23)

kxhfresize(exit): resized to 12 slots (numalloc=8, max=12)

slot table resized: old=23 wanted=12 got=12 unload=0

*** rowsrcid: 1 hash join build hash table (phase 1) ***

total number of partitions: 8

number of partitions which could fit in memory: 8

number of partitions left in memory: 8

total number of slots in in-memory partitions: 8

total number of rows in in-memory partitions: 50

(used as preliminary number of buckets in hash table)

estimated max # of build rows that can fit in avail memory: 66960

### partition distribution ###

partition:0 rows:5 clusters:1 slots:1 kept=1

partition:1 rows:6 clusters:1 slots:1 kept=1

partition:2 rows:4 clusters:1 slots:1 kept=1

partition:3 rows:9 clusters:1 slots:1 kept=1

partition:4 rows:5 clusters:1 slots:1 kept=1

partition:5 rows:9 clusters:1 slots:1 kept=1

partition:6 rows:4 clusters:1 slots:1 kept=1

partition:7 rows:8 clusters:1 slots:1 kept=1

ps:每个分区只有不到10行,这里有一个重要的参数kept,1在内存中,0在磁盘

*** (continued) hash join build hash table (phase 1) ***

ps:hash join的第一阶段,但是要观察更多的阶段,需提高trace的level,这里略过

revised number of hash buckets (after flushing): 50

allocating new hash table.

*** (continued) hash join build hash table (phase 1) ***

requested size of hash table: 16

actual size of hash table: 16

number of buckets: 128

match bit vector allocated: false

kxhfresize(enter): resize to 14 slots (numalloc=8, max=12)

kxhfresize(exit): resized to 14 slots (numalloc=8, max=14)

freeze work area size to: 2359k (14 slots)

*** (continued) hash join build hash table (phase 1) ***

total number of rows (may have changed): 50

number of in-memory partitions (may have changed): 8

final number of hash buckets: 128

size (in bytes) of hash table: 1024

kxhfiterate(end_iterate): numalloc=8, maxslots=14

*** (continued) hash join build hash table (phase 1) ***

### hash table ###

# note: the calculated number of rows in non-empty buckets may be smaller

# than the true number.

number of buckets with 0 rows: 86

number of buckets with 1 rows: 37

number of buckets with 2 rows: 5

number of buckets with 3 rows: 0

ps:桶里面的行数,最大的桶也只有2行,理论上,桶里面的行数越少,性能越佳。

number of buckets with 4 rows: 0

number of buckets with 5 rows: 0

number of buckets with 6 rows: 0

number of buckets with 7 rows: 0

number of buckets with 8 rows: 0

number of buckets with 9 rows: 0

number of buckets with between 10 and 19 rows: 0

number of buckets with between 20 and 29 rows: 0

number of buckets with between 30 and 39 rows: 0

number of buckets with between 40 and 49 rows: 0

number of buckets with between 50 and 59 rows: 0

number of buckets with between 60 and 69 rows: 0

number of buckets with between 70 and 79 rows: 0

nmber of buckets with between 80 and 89 rows: 0

number of buckets with between 90 and 99 rows: 0

number of buckets with 100 or more rows: 0

### hash table overall statistics ###

total buckets: 128 empty buckets: 86 non-empty buckets: 42

ps:创建了128个桶,oracle 7开始的计算公式

bucket数=0.8*hash_area_size/(hash_multiblock_io_count*db_block_size)

但是不准确,估计10g发生了变化。

total number of rows: 50

maximum number of rows in a bucket: 2

average number of rows in non-empty buckets: 1.190476

小表500行时候的trace分析

original hash-area size: 3925453

memory for slot table: 2826240

。。。

hash-join fanout: 8

number of partitions: 8

。。。

### partition distribution ###

partition:0 rows:52 clusters:1 slots:1 kept=1

partition:1 rows:63 clusters:1 slots:1 kept=1

partition:2 rows:55 clusters:1 slots:1 kept=1

partition:3 rows:74 clusters:1 slots:1 kept=1

partition:4 rows:66 clusters:1 slots:1 kept=1

partition:5 rows:66 clusters:1 slots:1 kept=1

partition:6 rows:54 clusters:1 slots:1 kept=1

partition:7 rows:70 clusters:1 slots:1 kept=1

ps:每个partition的行数增加

。。。

number of buckets with 0 rows: 622

number of buckets with 1 rows: 319

number of buckets with 2 rows: 71

number of buckets with 3 rows: 10

number of buckets with 4 rows: 2

number of buckets with 5 rows: 0

。。。

### hash table overall statistics ###

total buckets: 1024 empty buckets: 622 non-empty buckets: 402

total number of rows: 500

maximum number of rows in a bucket: 4

average number of rows in non-empty buckets: 1.243781

小表5000行时候的trace分析

original hash-area size: 3809692

memory for slot table: 2826240

。。。

hash-join fanout: 8

number of partitions: 8

nuber of slots: 23

multiblock io: 15

block size(kb): 8

cluster (slot) size(kb): 120

minimum number of bytes per block: 8160

bit vector memory allocation(kb): 128

per partition bit vector length(kb): 16

maximum possible row length: 270

estimated build size (kb): 0

。。。

### partition distribution ###

partition:0 rows:588 clusters:1 slots:1 kept=1

partition:1 rows:638 clusters:1 slots:1 kept=1

partition:2 rows:621 clusters:1 slots:1 kept=1

partiton:3 rows:651 clusters:1 slots:1 kept=1

partition:4 rows:645 clusters:1 slots:1 kept=1

partition:5 rows:611 clusters:1 slots:1 kept=1

partitio:6 rows:590 clusters:1 slots:1 kept=1

partition:7 rows:656 clusters:1 slots:1 kept=1

。。。

# than the true number.

number of buckets with 0 rows: 4429

number of buckets with 1 rows: 2762

number of buckets with 2 rows: 794

number of buckets with 3 rows: 182

number of buckets with 4 rows: 23

number of buckets with 5 rows: 2

number of buckets with 6 rows: 0

。。。

### hash table overall statistics ###

total buckets: 8192 empty buckets: 4429 non-empty buckets: 3763

total number of rows: 5000

maximum number of rows in a bucket: 5

ps:当小表上升到5000行的时候,bucket的rows最大也不过5行。注意,如果bucket行数过多,遍历带来的开销会带来性能的严重下降。

average number of rows in non-empty buckets: 1.328727

结论:

oracle数据库10g中,内存问题并不是干扰hash join的首要问题,现今硬件价格越来越便宜,内存2g,8g,64g的环境也很常见。大家在针对hash join调优的过程,更要偏重于partition和bucket的数据分配诊断。

 
 
上一篇: 详细讲解oracle i/o子系统的配置和设计 (1)    下一篇: oracle 10g分区表维护中的两个注意事项
  相关文档
全面解析oracle数据库的系统和对象权限 05-07
深入讲解memory_target与自动内存管理 03-24
实例讲解分区表的可用性及相关错误案例 (1) 03-31
花最少的时间完成最多的工作──字典表 02-01
轻松解决oracle xdb的8080端口冲突问题 03-03
轻松掌握jdbc操纵Oracle数据库lob字段 09-29
Oracle导出数据库结构到PowerDesigner 09-01
教你怎样在oracle数据库中高速导出/导入 (1) 10-27
Oracle中如何实现某一字段自动增加1 01-15
oracle安装后配置和启动企业管理器的过程 08-15
全面剖析Oracle数据库中的分区功能 04-11
教你轻松掌握如何用toad或oem管理job 02-28
在Oracle9i中Oracle DATA计算时间差 04-23
讲解oracle数据库自定义异常的使用方法 04-15
oracle与data guard环境中重建控制文件 (1) 03-27
在Oracle 8x中实现自动断开后再连接 01-15
讲解oracle数据库ora-00257故障的解决过程 (1) 07-15
详细讲解"oracle"服务器的常用命令行 03-14
实例讲解oracle到sql server主键的迁移 05-14
指定一个where条件来有条件地导出记录 04-03
返回首页 | 关于我们 | J网章程 | JSP空间合租 | 客服中心 | 免责声明 | 常见问题 | 参观机房
本站主机空间代理至厦门市华众网络科技有限公司
《中华人民共和国增值电信业务经营许可证》
编号:闽B2-20050079
@2005-2008福建JSP技术网 版权所有 闽ICP备05000928号
厦门(总部):13616026886 福州:0591-87655121
邮箱:admin@fjjsp.com 站长QQ,点击这里给我发消息