We've been trying to implement a text search component using Oracle 11.2.0.2, featuring an 11g enhancement called SDATA (structured data) sections. It helps solve one of Oracle Text's biggest issues -- that of "mixed" queries (a query that contains a basic text search with structured data qualifiers -like date limits- that are supposed to make the query faster).
In past versions, this type of query would never perform as fast as either a straight text query or a structured-only query; the optimizer would pick one way or the other to generate the list of matching rowids for one part of the query, then run the other part to generate the "final" list of matching rows. So the query takes longer, when the user intuitively feels that the additional information provided should help make the query faster.
In this case, our testing was giving us some unexpected results when we were filtering SDATA negation-- chiefly, the resultset count of matches was lower than it should have been. So, for example, a query for something like 'Ben Franklin and SDATA(language != "FRENCH")' did not return the right number of rows based on the tester's description of the test data.
Investigation revealed that the core issue was that the language field was null in some documents (even though the XML document described the element as mandatory).
Here's an example:
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, Oracle Label Security and OLAP options SQL> create table test_tab 2 (text varchar2 (30), 3 language varchar2 (10)); Table created. SQL> insert all 2 into test_tab values ('Bon Homme Richard', 'ENGLISH') 3 into test_tab values ('Bon Homme Richard', 'FRENCH') 4 into test_tab values ('Bon Homme Richard', 'CANADIAN') 5 into test_tab values ('Bon Homme Richard', '') 6 select * from dual; 4 rows created. SQL> insert into test_tab select object_name, null from all_objects; 54562 rows created. SQL> create index test_idx on test_tab (text) 2 indextype is ctxsys.context filter by language; Index created. SQL> select score (1), text, language from test_tab 2 where contains(text,'(Bon Homme Richard)', 1) > 0; SCORE(1) TEXT LANGUAGE ---------- ------------------------------ ---------- 15 Bon Homme Richard ENGLISH 15 Bon Homme Richard FRENCH 15 Bon Homme Richard CANADIAN 15 Bon Homme Richard SQL> select score (1), text, language from test_tab 2 where contains(text,'(Bon Homme Richard) 3 AND sdata (language != "FRENCH")', 1) > 0; SCORE(1) TEXT LANGUAGE ---------- ------------------------------ ---------- 15 Bon Homme Richard ENGLISH 15 Bon Homme Richard CANADIANIn this case, the fix was to correct the source data to set the language field properly. In a case where null values are permitted, the searcher [or developer] needs to be aware of these comparison rules:
select score (1), text, language from test_tab where contains(text,'(Bon Homme Richard) AND (sdata (language != "FRENCH") OR sdata(language is null))', 1) > 0; SCORE(1) TEXT LANGUAGE ---------- ------------------------------ ---------- 15 Bon Homme Richard ENGLISH 15 Bon Homme Richard CANADIAN 15 Bon Homme Richard
No comments:
Post a Comment