The Fatal Weakness of NL2SQLThe Data Relationship Problem — and How Arisyn Breaks the Deadlock
- Arisyn

- Nov 27, 2025
- 7 min read
Updated: Dec 15, 2025
NL2SQL (Natural Language to SQL) is a technology that automatically converts natural language into Structured Query Language (SQL). Its development has always centered around the goal of "lowering the threshold for data querying". Evolving from rule-driven approaches to AI generalization, it has gradually become an important bridge connecting natural language and structured data. No technology develops in isolation. Starting from the 1970s with the "rule and template-based" method, NL2SQL technology has gone through stages of "statistical methods, deep learning, and pre-trained models". Currently, its main research direction lies in "large language models (LLMs) and Few-shot learning". All technologies emerge to solve problems, so let's analyze the core issues of NL2SQL from an essential perspective.
Two Essential Elements for NL2SQL Implementation
First, here are two essential points in my view:
Understanding the intent of natural language
Understanding the meaning of data resources
1. Understanding the Intent of Natural Language
Let's first provide two examples to illustrate potential application scenarios of NL2SQL:
Example 1: Please group by customer and summarize the order quantity and total order amount of each customer this month.
Example 2: Please help me find out who the top customer of this month is.
For the question in Example 1, NL2SQL converts the clear requirement into SQL code in accordance with the user's implied code-writing logic. For the question in Example 2, however, it is necessary to first understand the meaning of the question, then deduce how to solve it, convert the solution into a data-related problem based on the method, and finally generate the SQL code. With the development of LLM technology, capabilities in understanding problems, analyzing issues, and logical reasoning have become increasingly strong. Therefore, for the question in Example 2, we can simply conclude that LLMs are capable of simplifying it to the level of Example 1.
This article focuses on NL2SQL. Even if LLMs have shortcomings in this aspect, they will not be included in the scope of this discussion for the time being. However, a conclusion can be drawn: in the technical implementation of NL2SQL, the understanding of natural language must currently be achieved through LLMs.
2. Understanding the Meaning of Data Resources
Let's first look at the following set of data:
C1 | C2 | C3 | C4 | C5 |
AA | 5.2 | 3.8 | Yes | 1 |
AB | 3.3 | 2 | Yes | 0 |
AC | 5 | 5 | Yes | 1 |
BA | 6 | 0.8 | No | 1 |
BB | 12 | 3 | Yes | 0 |
In the given sample data, the data items are named C1, C2, C3, C4, and C5 respectively, and the data values also have no obvious meaning. Such data cannot be understood or used. We can assign certain meanings to these columns. For example: C1 (Product Category), C2 (Product Quantity), C3 (Unit Price), C4 (Warranty Status), C5 (Settlement Flag). Alternatively, another set of definitions can be given: C1 (Customer ID), C2 (Current Consumption), C3 (Balance), C4 (VIP Customer Flag), C5 (Discount Flag). This example clearly illustrates that without understanding the meaning of data, no matter how well we understand natural language or how accurate our reasoning logic is, it is impossible to convert the natural language query into data operations.
In the exploration of the NL2SQL field, published papers or open-source products have adopted similar approaches to address this issue. There are mainly two forms: one is to provide database structure descriptions in a certain way through the Retrieval-Augmented Generation (RAG) method; the other is to enable LLMs to infer the meaning of data based on the corresponding relationship between questions and answers through samples.
A Core Issue in Current NL2SQL Technical Implementation
Against the backdrop of existing technologies, we can assume that the two essential issues discussed above have been resolved. However, why haven't there been any "star" applications so far? Some software products have implemented functions such as data querying and analysis through natural language in certain application scenarios, but there is still no universally recognized successful case in the industry. Most of these efforts are exploratory in nature, with greater symbolic significance than practical value.
The author has personally conducted several experiments. Below is the conclusion, followed by an in-depth discussion:
LLMs can receive and understand various types of data definition information;
When questions and data definitions conform to "common sense" cognition, NL2SQL achieves extremely high accuracy;
When data standards are non-standard, it is impossible to generate accurate SQL.
The following is a detailed explanation: The author designed 7 data tables, namely Course Information, Teaching Material Information, College Information, Class Information, Student Information, Exam Score Evaluation Criteria, and Exam Scores. The naming and coding of the data tables and data items were relatively standardized. For example, the student ID was represented as "Student ID (STUDENT_NO)" in both the Student Information and Exam Scores tables. After inputting this information into the LLM and raising relevant questions, the model was able to understand and answer correctly (Note: No Few-shot learning was used). This indicates that the capabilities of natural language understanding and reasoning play a dominant role.
For the above set of data tables, the table structure remained unchanged, but the names and codes of the data tables and data items were modified to be inconsistent with common sense logic. For instance, "Course" was changed to "Muggle", "Exam" to "Trial", and "Class" to "Cave". Additionally, data items with the same meaning were given different names and codes in different data tables. After inputting this modified information into the LLM and rephrasing the same questions in line with the new naming conventions, we found that the model could no longer provide correct answers. It can be seen that the model's understanding and reasoning capabilities are limited by the input of data structure information and the logic of the questions. (The author has retained the test datasets and question sets; those interested can contact the author to obtain them.) The key issue is that based on common sense, it is impossible to clarify the logical relationships between data from the given data definitions.
Methods for Judging Data Relationships in NL2SQL
Based on the above discussion, we now introduce a core issue addressed in this article: how to analyze the logical relationships between data tables, which can also be simplified to how to establish the association relationships between given data tables.
First, let's explain why the current mainstream technical routes are not effective solutions to this problem.
First approach: Relying on the inherent capabilities of LLMs
This approach has two major limitations, making it unable to cope with complex on-site application scenarios. The first limitation is that general-purpose models have difficulty adapting to professional or private scenarios, where business content and logic form independent systems. For the given data resources, it is impossible to understand the internal relationships of the data based on common sense cognition. The second limitation is that data integration applications are an inevitable path to realizing data value, and the inconsistency of data standards caused by multiple systems and multiple data sources has become an unavoidable problem. Tasks such as data source verification and inter-table relationship confirmation are already challenging in specialized data governance work; without relevant input, LLMs cannot establish the logical relationships between data tables.
Second approach: Adopting pre-training
If we provide sufficient samples, LLMs can learn and reason to form correct association relationships between data tables. We can briefly understand the essence of this technical implementation:
Question 1: *****? Answer 1: select a.c1,b.c2 from a join b on a.c3=b.c4;
Question 2: ******? Answer 2: select a.c1,c.c3 from a join c on a.c3=c.c2;
……
By analogy, when we need to access data from tables b and c, the possible association relationship is b.c4 = c.c2.
In other words, the sample data provided may not require specific scenarios, but must contain information that enables the inference of this logic. In a production environment, this approach requires inputting a large amount of sample data to enable LLMs to analyze correct data relationships. However, when new data resources are added or new data integration possibilities emerge, this work needs to be re-conducted. Therefore, this approach incurs high implementation costs, and the entire process may need to be redone once requirements are updated.
The Potential of Arisyn Technology to Improve NL2SQL Accuracy
Arisyn is an open-source intelligent data link generation technology, whose core components include: intelligent inter-table relationship analysis, data link generation, data link scoring, and data link optimization.
The application results of this technology can be used as input for NL2SQL, replacing the pre-training process. We will now analyze the problems it solves and its complementarity to NL2SQL.
Let's first assume that NL2SQL has correctly understood the natural language question and clarified the logic for answering it. However, when converting this logic into data operations, it will face the following problems:
In practical application scenarios, the required data may come from multiple systems, and the naming conventions and coding standards between these systems are not unified. For example, a product identifier may exist in various forms in different systems and data tables, such as Product Name, Product Number, Product Code, Product Encoding, and Product CODE. There may even be two data items—Product Name and Product Code—in the same table. We have previously discussed the shortcomings of using sample data to understand data relationships. Therefore, without sufficient sample input, LLMs cannot determine how to associate different data tables on their own: should it be A.Product Name = B.Product Encoding, or A.Product Code = B.Product Number? This becomes even more unanalyzable when the actual meaning of data table and data item names is difficult to comprehend through common sense.
Currently, methods such as combining RAG and SQL execution feedback (e.g., DAIL-SQL) can be used to improve SQL accuracy, but this only achieves partial improvement and does not fundamentally solve the problem. In the data governance projects the author has participated in, there have been scenarios involving dozens of systems, thousands of data tables, and hundreds of thousands of data items. Let's simulate such an example: there are three data tables A, B, and C. Table A records daily product production, with products represented by "Product Name (CPMC)"; Table B records product sales, with products represented by "Product Code (CPBM)"; and Table C is an intermediate table describing the correspondence between product names and product codes, with two fields (CPMC, CPBM). If this set of information is provided to an LLM, the model can automatically establish this association relationship. However, if the LLM is given 8,000 data tables, and the data items in this mapping relationship are represented by neutral terms such as CODE1 and CODE2 instead of Product Name and Product Code, the LLM will have no basis for analysis and thus cannot establish effective logical relationships. Furthermore, when integrating multi-source data, there may be multiple possible logical relationships. So, which relationship should be used to establish the connection? These issues are closely related to the specific conditions of the data resources.
Starting from data resources, Arisyn intelligently analyzes the relationships between data tables. When a data integration requirement is given, it can quickly generate feasible links and provide reasonable links based on scoring results. When we use this result as a constraint for LLMs to generate SQL, the LLMs no longer need to infer (or guess) uncertain data association relationships.
Arisyn can complement NL2SQL's capabilities rather than replace it. The combination of the two can significantly improve the accuracy of NL2SQL. At the same time, it reduces reliance on pre-training, allowing the focus to shift to the establishment of RAG. Since RAG is scalable, it can continuously enhance NL2SQL's understanding of business and adapt to new data resources.


Comments