Ai Dreams Forum

Member's Experiments & Projects => AI Programming => Topic started by: Zero on September 19, 2021, 11:49:45 am

Title: Help needed: glob pattern generator
Post by: Zero on September 19, 2021, 11:49:45 am
Hi,

I need help! I'm trying to implement a glob pattern generator, that would take a list of strings as input, and that would output the longest glob pattern that matches those strings.

Code: text
input:
    so I have a car
    so I have a bicycle
    so you have a car

output:
    so * have a *

Is there such a tool already somewhere?
Title: Re: Help needed: glob pattern generator
Post by: infurl on September 19, 2021, 12:55:35 pm
https://en.wikipedia.org/wiki/Longest_common_subsequence_problem (https://en.wikipedia.org/wiki/Longest_common_subsequence_problem)

Here's a detailed description of the algorithm that you need to use. It contains lots of pseudocode which you should be able to convert into the language that you are using. I searched for ready-made libraries too but didn't find any. Try searching directly on github, you might have better luck than I did.
Title: Re: Help needed: glob pattern generator
Post by: infurl on September 19, 2021, 12:59:47 pm
http://www.rosettacode.org/wiki/Longest_common_substring (http://www.rosettacode.org/wiki/Longest_common_substring)

I just thought of looking on rosettacode.org and sure enough, they have the algorithm implemented there, in just about every language you could think of no less. Good luck!
Title: Re: Help needed: glob pattern generator
Post by: Zero on September 19, 2021, 01:03:15 pm
Thanks, I already know the LCS algo, but it's not what I want to achieve. I actually used it as part of my glob generator and successfully got globs from 2 strings, but not from N strings... yet!
:)
Thank you!
Title: Re: Help needed: glob pattern generator
Post by: infurl on September 19, 2021, 01:06:37 pm
https://en.wikipedia.org/wiki/Longest_common_subsequence_problem#Complexity (https://en.wikipedia.org/wiki/Longest_common_subsequence_problem#Complexity)

Quote
For the general case of an arbitrary number of input sequences, the problem is NP-hard.[1] When the number of sequences is constant, the problem is solvable in polynomial time by dynamic programming.

Oh dear, it does like a tricky one doesn't it.
Title: Re: Help needed: glob pattern generator
Post by: ivan.moony on September 19, 2021, 01:24:36 pm
I'm not completely sure, but shouldn't it work like this: let's say we have to compare strings A, B, C, and D. First we compare A and B, then A and C, then A and D. Along the comparing, we accumulate differences into an array.
Title: Re: Help needed: glob pattern generator
Post by: Zero on September 19, 2021, 01:43:50 pm
Quote
Oh dear, it does like a tricky one doesn't it.

NP-hard? ouch... I need NP-tough help  ;D

I'm not completely sure, but shouldn't it work like this: let's say we have to compare strings A, B, C, and D. First we compare A and B, then A and C, then A and D. Along the comparing, we accumulate differences into an array.

Yes, or even cross-comparing them all, but it doesn't yield an obvious answer yet.
Title: Re: Help needed: glob pattern generator
Post by: Zero on September 19, 2021, 05:21:44 pm
Code: javascript


function lcs(S, T) {
   
var D = [];
var LCS_len = 0;
var LCS = [];

for (var i = 0; i < S.length; i++) {
D[i] = [];
for (var j = 0; j < T.length; j++) {
if (S[i] == T[j]) {
if (i == 0 || j == 0) {
D[i][j] = 1;
} else {
D[i][j] = D[i-1][j-1] + 1;
}

if (D[i][j] > LCS_len) {
LCS_len = D[i][j];
LCS = [S.substring(i-LCS_len+1, i+1)];
} else if (D[i][j] == LCS_len) {
LCS.push(S.substring(i-LCS_len+1, i+1));
}
} else {
D[i][j] = 0;
}
}
}
return LCS[0];
}



function multiLcs(strings) {
   
    let candidate = strings[0];
   
    for (let str1 of strings)
        for (let str2 of strings) {
            let subs = lcs(str1, str2);
            if (!subs) return '';
            if (subs.length < candidate.length)
                candidate = subs;
        }
    return candidate;
}



function recurGenGlob(strings) {

    let center = multiLcs(strings);
   
    if (center.length == '') return '';
   
    let pre = strings.map(str => str.slice(0, str.indexOf(center)));
    let post = strings.map(str => str.slice(str.indexOf(center) + center.length));
   
    center = recurGenGlob(pre) + '*' + center + '*' + recurGenGlob(post);
   
    return center;
}



function genGlob(strings) {
   
    let center = recurGenGlob(strings.map(s => '  '+s+'  '));
    let result = '';
    let star = false;
    for (let i = 0; i < center.length; i++) {
        if (center[i] != '*' || star)
            result += center[i];
        star = center[i] == '*';
    }
    return result.trim();
}



Title: Re: Help needed: glob pattern generator
Post by: ivan.moony on September 19, 2021, 05:41:38 pm
Does that work?
Title: Re: Help needed: glob pattern generator
Post by: Zero on September 19, 2021, 07:10:50 pm
I don't know. It does spit something out, but I believe it's not optimal.

Edit: I mean, I believe the returned value of genGlob() is not always the longest possible pattern. It needs testing. But since it's Js, you can directly copy-paste it in your browser's console ang give it a shot: at least for the example I gave in the original post, it works (better than me, since I forgot the common 'c' in 'car' and 'bicycle', my own hand-made output was wrong). The trim() trick is a bit hacky though.
Title: Re: Help needed: glob pattern generator
Post by: Zero on September 19, 2021, 11:23:09 pm
doesn't work with
Code
    "and so I have a car",
    "so I have a bicycle",
    "so you have a car",
Title: Re: Help needed: glob pattern generator
Post by: Zero on September 19, 2021, 11:56:01 pm
Here is the new corrected genGlob().

Code: javascript
genGlob = (strings) => {
   
    let center = recurGenGlob(strings);

    while (center.indexOf('**') > -1)
        center = center.replace(/\*\*/g, '*');

    let startStar = false,
        endStar = false,
        startLetter = strings[0][0],
        endLetter = strings[0][strings[0].length-1];
       
    for (let s of strings) {
        if (s[0] != startLetter) startStar = true;
        if (s[s.length-1] != endLetter) endStar = true;
    }
   
    if (!startStar) center = center.slice(1);
    if (!endStar) center = center.slice(0, -1);

    if (center.length == 0) center = '*';

    return center;
}

Title: Re: Help needed: glob pattern generator
Post by: 8pla.net on September 29, 2021, 11:20:33 pm
I may I know how. Are you OK with PHP?

Title: Re: Help needed: glob pattern generator
Post by: Zero on October 03, 2021, 11:25:33 am
I believe my JS is correct, but yes I'm OK with PHP.
Title: Re: Help needed: glob pattern generator
Post by: frankinstien on October 03, 2021, 07:49:26 pm
This will work with any N number of words and sentences:

Code
using System;
using System.Collections.Generic;
using System.Linq;

public class Program
{
public static void Main()
{
List<string> alreadyCounted = new List<string>();
var counts = new Dictionary<string, int>();
string[] sets = new string[] { "so I have a car, but this car is used", "so I have a bicycle", "so you have a car" };
foreach (String aSet in sets)
{
var sentenceSplit = aSet.Split(new char[] { ' ', ',' });
alreadyCounted.Clear();
foreach (String aWord in sentenceSplit)
{
var trimmed = aWord.Trim();
if (counts.ContainsKey(trimmed) && !alreadyCounted.Contains(trimmed))
counts[trimmed] = counts[trimmed] + 1;
else if(!alreadyCounted.Contains(trimmed))
counts.Add(trimmed, 1);
alreadyCounted.Add(trimmed);
}
}

var results = counts.Where(x => x.Value == sets.Length);
if (results != null)
{
foreach (KeyValuePair<string, int> entry in results)
Console.WriteLine(entry.Key);
}
}
}

Try it on Fiddle: https://dotnetfiddle.net/

Its a small mod to get it to work with a file where you just load the file into the string array.

Title: Re: Help needed: glob pattern generator
Post by: frankinstien on October 03, 2021, 08:18:29 pm
I made modifications to force the code to only count a word once when checking within the same sentence.
Title: Re: Help needed: glob pattern generator
Post by: frankinstien on October 03, 2021, 11:03:12 pm
I placed a sequence check so if not all sentences are in a somewhat similar sequence it will print no matches.

Code
using System;
using System.Collections.Generic;
using System.Linq;

public class Program
{
public static void Main()
{
List<string> alreadyCounted = new List<string>();
var counts = new Dictionary<string, Tally>();
string[] sets = new string[] { "so I have a car, but this car is used", "so I have a bicycle", "so you have a car", "so, I have a knife too" };
foreach (String aSet in sets)
{
var sentenceSplit = aSet.Split(new char[] { ' ', ',' });
alreadyCounted.Clear();
int pos = 0;
foreach (String aWord in sentenceSplit)
{
pos++;
var trimmed = aWord.Trim();
if (counts.ContainsKey(trimmed) && !alreadyCounted.Contains(trimmed))
{
counts[trimmed].Count = counts[trimmed].Count + 1;
counts[trimmed].Positions.Add(pos);
}
else if(!alreadyCounted.Contains(trimmed))
counts.Add(trimmed, new Tally(1, pos));
alreadyCounted.Add(trimmed);
}
}

var results = counts.Where(x => x.Value.Count == sets.Length);
if (results != null && CheckSequence(results.ToList()))
{
foreach (KeyValuePair<string, Tally> entry in results)
{
Console.Write(entry.Key + ": ");
foreach(int pos in entry.Value.Positions)
{
Console.Write(pos.ToString() + ",");
}
Console.WriteLine("");
}
}
else
Console.WriteLine("No matches that are sequentially similar.");
}

private static bool CheckSequence(List<KeyValuePair<string, Tally>> results)
        {
foreach (KeyValuePair<string, Tally> anEntry in results)
{
int temp = anEntry.Value.Positions[0];
for (int idx = 1; idx <anEntry.Value.Positions.Count; idx++)
                {
if (temp > anEntry.Value.Positions[idx])
{
return false;
}

temp = anEntry.Value.Positions[idx];

}
}

return true;
        }
}

sealed class Tally
{
public int Count {set; get;}
public List<int> Positions {set; get;}

public Tally(int count, int pos)
{
Count = count;
Positions = new List<int>();
Positions.Add(pos);
}
}
Title: Re: Help needed: glob pattern generator
Post by: 8pla.net on October 05, 2021, 07:15:49 am
I was thinking word frequency analysis...

Code
<?php
$inputs = <<<EOD
so I have a car
so I have a bicycle
so you have a car
EOD;
$table=array_count_values(str_word_count($inputs, 1));
foreach($table as $word=>$freq){
if($freq==3){
   $words[]=$word;
}
}
echo var_export($words,TRUE);
?>

array (
  0 => 'so',
  1 => 'have',
  2 => 'a',
)
Title: Re: Help needed: glob pattern generator
Post by: Zero on October 05, 2021, 10:17:08 pm
Check this one:
Code: text
    "car I have a car, but this car is used",
    "so I have a car",
    "I so car have you"
Mine is wrong...
Getting:
Code: text
"* I* have *"
Expected:
Code: text
"*I * have *"
Title: Re: Help needed: glob pattern generator
Post by: frankinstien on October 06, 2021, 08:22:19 pm
Check this one:
Code: text
    "car I have a car, but this car is used",
    "so I have a car",
    "I so car have you"
Mine is wrong...
Getting:
Code: text
"* I* have *"
Expected:
Code: text
"*I * have *"

Trim the text for each word parsed, that would remove the space. If you look at my code it does that and you could also modify it, easily, to sense the number of sentences with a pattern and list those with it and without it.