Search  
Friday, November 15, 2019 ..:: Forum ::.. Register  Login
 Forum Minimize
Pentru a putea posta mesaje trebuie să vă înregistraţi.
Notă: Mesajele cu conţinut jignitor sau ilegal (inclusiv cereri de soft piratat) nu sunt acceptate şi vor fi şterse imediat .

Pentru a primi raspunsuri rapide si corecte, scrieti in mesaj ce intentionati sa faceti, ce mesaj de eroare primiti, in ce context si in urma caror actiuni. De asemenea, mentionati versiunea de FoxPro in care lucrati!
Dacă nu specificați versiunea, se consideră VFP 9.0 SP2.

SearchForum Home
  Visual FoxPro  Baze de date, tabele, view-uri si indecsi  Găsitor de dupl...
 Găsitor de duplicate :)
 
 2/17/2012 3:02:09 AM
User is offlineardeleand
2 posts


Găsitor de duplicate :)
 (N/A)
Am căutat de curând ceva care să-mi caute duplicatele dintr-o tabelă şi nu am găsit nimic util, aşa că am făcut unul.
Dacă e folositor şi altcuiva...

Nu e rachetă, dar merge; pe volume mari cel mai bine e noaptea sau de sărbători! ;)
(nu ştiu cum se vede în post, .txt-ul are acelaşi lucru - văd că .prg-uri nu se pot...!)



PARAMETERS cRefTbl, cDupTbl, aExtRec, cFldAct, uF1, uF2, uF3, uF4, uF5, uF6, uF7, uF8, uF9, uF10, uF11, uF12, uF13, uF14, uF15, uF16, uF17, uF18, uF19, uF20, uF21, uF22, uF23
*
* Find duplicates in the reference table ,
* copy them in duplicates table ,
* searching for all records or one external "record",
* searching all fields or, as specified in , Only / Except - fields.
*
* cRefTbl: reference table.
* Empty -> table in current area; no table opened -> return -1
*
* cDupTbl: duplicate table; same structure as plus 2 fields: _RefRec and _DupRec,
* containing record number (in ) of reference and duplicate record;
* Empty -> a new one is created, with name + '_dup'.
* Remain open in lowest unused area.
*
* aExtRec: array of external values(record) to searched; must be 1D and have same structure as (lenght and type);
* in this case _RefRec in is allways 0;
* Empty -> internal search.
*
* cFldAct: action to apply to following passed fields:
* O - Only passed fields are included in search (default)
* E - Exclude passed fields from search
*
* uF1 - uF22: fields to be processed in search, in any order or combination. Can be refered by:
* [n] - order in structure
* or
* [name] - name of field
* Empty -> all fields are processed (i.e. two records must have all fields identical, to be considered duplicate).
*
* Return: 0 - nr. of duplicates found (respecting applied filter)
* -1 if: no passed and no table open in current area
* is not an array
* or current table is not exclusive or can't be locked
*
* Ex.:
* FindDup() - find all duplicates in current table and copy in current table name + '_dup'
* FindDup('test') - find all duplicates in Test.dbf and copy in Test_dup.dbf
* FindDup('test', 'dupl') - find all duplicates in Test.dbf and copy in Dupl.dbf
* FindDup('test',,@aTestRec) - find duplicates of array aTestRec in Test.dbf and copy in Test_dup.dbf
* FindDup('test',,@aTestRec,'E',5,'obs')- find duplicates of array aTestRec in Test.dbf, searching all fields except
* FIELD(5) and 'obs', and copy in Test_dup.dbf
* FindDup('test', 'dupl',,'O',5,'obs') - find all duplicates in Test.dbf, searching only fields FIELD(5) and 'obs',
* and copy in Dupl.dbf
*
* Time (C2D 2GHz / 2GB / 640GB-64MB)
* 15.500 rec. - 620s (all fields, 0 found)
* 15.500 rec. - 825s (except 3 fields, 7 found)
* - above times are relative to data structure
*
* CAUTIONS! Slow!
*
*---------------------------------------------------------------
* Any suggestions, complaints or congratulations can be sent to
* Dan ARDELEAN - soft@damont.ro (subject: FindDup)
* in English or Romanian. Will be grateful for any of them! :)
*===============================================================



PRIVATE nSchFlds, nRefFlds, f, nRefRec
LOCAL nParam, lIsExtRec, nTotRec, cSetExact, nIniArea, nIniRec, lIsUsed, lIsExclusive, lIsFLocked, nParamName, cFnType, ;
lIsFldValid, nFldNr, cFldName, cSetComp, nDupCnt
DECLARE aSchFld[22,2]
LOCAL ARRAY aRefFld[1,18]

nParam = PCOUNT()
IF nParam < 1 OR EMPTY(cRefTbl)
cRefTbl = ALIAS()
ENDIF
IF EMPTY(cRefTbl)
RETURN -1
ENDIF
IF nParam < 2 OR EMPTY(cDupTbl)
cDupTbl = cRefTbl + '_dup'
ENDIF
IF nParam < 4 OR EMPTY(cFldAct)
cFldAct = 'O'
ENDIF
lIsExtRec = TYPE('aExtRec',1) == 'A'
IF lIsExtRec AND ALEN(aExtRec) != FCOUNT(cRefTbl)
RETURN -1
ENDIF

* Check if exclusive or lock. Only to avoid modifying table during search!
lIsUsed = USED(cRefTbl)
lIsExclusive = .F.
lIsFLocked = .F.
IF lIsUsed
lIsExclusive = ISEXCLUSIVE(cRefTbl)
lIsFLocked = ISFLOCKED(cRefTbl)
IF !(lIsExclusive OR lIsFLocked)
IF !FLOCK(cRefTbl)
* message
RETURN -1
ENDIF
ENDIF
ELSE
USE (cRefTbl) IN 0 EXCLUSIVE
ENDIF
nTotRec = RECCOUNT(cRefTbl)
IF nTotRec == 0
RETURN 0
ENDIF

IF USED(cDupTbl)
USE IN (cDupTbl)
ENDIF
COPY STRUCTURE TO (cDupTbl)
USE (cDupTbl) IN 0 EXCLUSIVE
ALTER TABLE (cDupTbl) ADD COLUMN _RefRec I
ALTER TABLE (cDupTbl) ADD COLUMN _DupRec I

* Build environment
cSetExact = SET("Exact")
SET EXACT ON
nIniArea = SELECT()
nIniRec = RECNO()
SELECT (cRefTbl)

* Build array with passed fields
nSchFlds = 0
nRefFlds = AFIELDS(aRefFld, cRefTbl)
FOR f = 1 TO 22
nParamName = 'uF'+LTRIM(STR(f,2))
cFnType = TYPE(nParamName)
lIsFldValid = .F.
DO CASE
CASE cFnType == 'N'
nFldNr = EVALUATE(nParamName)
cFldName = FIELD(nFldNr)
lIsFldValid = BETWEEN(nFldNr, 1, nRefFlds)
CASE cFnType == 'C'
cFldName = EVALUATE(nParamName)
nFldNr = ASCAN(aRefFld, cFldName, 1,nRefFlds,1,15)
lIsFldValid = (nFldNr > 0)
ENDCASE
IF lIsFldValid
nSchFlds = nSchFlds + 1
aSchFld[f,1] = nFldNr
aSchFld[f,2] = cFldName
ENDIF
ENDFOR
IF nSchFlds > 0
cSetComp = SET("Compatible")
SET COMPATIBLE OFF
DIMENSION aSchFld[nSchFlds,2]
SET COMPATIBLE &cSetComp
ENDIF

* Search
IF lIsExtRec
nRefRec = 0
GO TOP
= SearchRest(@aExtRec)
ELSE
SCAN
nRefRec = RECNO()
* progbar (nRefRec / nTotRec) if you wish!
SCATTER MEMO TO aRefRec
SKIP
= SearchRest(@aRefRec)
GO nRefRec
ENDSCAN
ENDIF
* progbar (100) if you wish!
nDupCnt = RECCOUNT(cDupTbl)

* Restore environment
IF nDupCnt == 0
USE IN (cDupTbl)
ENDIF
IF lIsUsed
IF!lIsFLocked
UNLOCK IN (cRefTbl)
ENDIF
ELSE
USE IN (cRefTbl)
ENDIF

SET EXACT &cSetExact
SELECT (nIniArea)
GO nIniRec

RETURN nDupCnt



PROCEDURE SearchRest
LPARAMETERS aSrchVal
LOCAL lIsDup

SCAN REST
lIsDup = .T.
DO CASE
CASE nSchFlds == 0 && all fields
FOR f = 1 TO nRefFlds
IF EVALUATE(FIELD(f)) != aSrchVal[f]
lIsDup = .F.
EXIT
ENDIF
ENDFOR

CASE cFldAct == 'O' && Only passed fields are included in search
FOR f = 1 TO nSchFlds
IF EVALUATE(aSchFld[f,2]) != aSrchVal[aSchFld[f,1]]
lIsDup = .F.
EXIT
ENDIF
ENDFOR

CASE cFldAct == 'E' && Exclude passed fields from search
FOR f = 1 TO nRefFlds
IF ASCAN(aSchFld, f, 1,nSchFlds,1,15) == 0
IF EVALUATE(FIELD(f)) != aSrchVal[f]
lIsDup = .F.
EXIT
ENDIF
ENDIF
ENDFOR
ENDCASE

IF lIsDup
SCATTER MEMO TO aDupRec
SELECT (cDupTbl)
APPEND FROM ARRAY aDupRec
REPLACE _RefRec WITH nRefRec, _DupRec WITH RECNO()
SELECT (cRefTbl)
ENDIF
ENDSCAN
RETURN
FindDup.txt 
 2/17/2012 11:04:58 AM
User is offlineDaniel Buduru
3508 posts
1st




Re: Găsitor de duplicate :)
 (N/A) Modified By Daniel Buduru  on 2/17/2012 11:07:11 AM)
Incearca si varianta asta, si spune-ne ce timp obtii pe tabela pe care ai facut testele.
In testele mele, pe o tabela de 380.000 inregistrari, cu 20 campuri, dureaza cam 9 sec.

LPARAMETERS tctable, tcdest, tcignorefields
LOCAL lctable
IF EMPTY(tctable)
    lctable=GETFILE('dbf')
    IF EMPTY(lctable)
        MESSAGEBOX('Nu s-a selectat nici o sursa')
        RETURN
    ENDIF
ELSE
   lctable=tctable
ENDIF

SELECT * FROM (lctable) WHERE .f. INTO CURSOR crstemp

LOCAL lcsql, lcfields, lcfield, laf[1], lni, lcSetExact, lcignorefields
lcSetExact=SET("Exact")
lcsql=''      
lcfields=''
lcignorefields=','+CHRTRAN(EVL(tcignorefields,''),["' ],'')
SET EXACT on
AFIELDS(laf,'crstemp')
FOR lni=1 TO ALEN(laf,1)
  lcfield= laf[lni,1]
  IF ATC(','+lcField+',',lcignorefields)>0 OR INLIST(laf(lni,2),'M','W')
     LOOP
  ENDIF
  lcFields=lcFields+IIF(EMPTY(lcfields),'',', ')+lcfield
NEXT
IF lcSetExact='OFF'
    SET EXACT OFF
endif
lcSql='Select COUNT(*) as cnt, '+lcFields+' From '+lcTable+' group by '+lcfields+' having cnt>1 into cursor crsdup'
&lcSql

BROWSE last

Daniel Buduru
 2/18/2012 3:04:28 AM
User is offlineardeleand
2 posts


Re: Găsitor de duplicate :)
 (N/A)
Corect! 0.28s!
N-am un răspuns logic la de ce am ignorat SQL-ul din aborddare! :)
Singura observaţie ar fi să mai pui ',' şi in coada lui lcignorefields, ca să găsească şi ultimul câmp.
O să folosesc "motorul" tau.

Mulţam!
 2/18/2012 11:02:56 AM
User is offlineDaniel Buduru
3508 posts
1st




Re: Găsitor de duplicate :)
 (N/A)
 ardeleand wrote
Corect! 0.28s!
N-am un răspuns logic la de ce am ignorat SQL-ul din aborddare! :)
Singura observaţie ar fi să mai pui ',' şi in coada lui lcignorefields, ca să găsească şi ultimul câmp.
O să folosesc "motorul" tau.

Mulţam!


Sql este mai rapid decat comenzile xbase (scan, skip, replace ...), fiind optimizat intern. De altfel, MS a inglobat in Sql server algoritmii eficienti din VFP.
Comanda Select count(*) as cnt, lista campuri from tabela group by lista campuri having cnt>1 e o comanda "standard" pentru gasirea duplicatelor. Ce am adaugat e codul pentru generearea dinamica a comenzii select, dar si asta e o procedura "standard" pemtru generarea unui select dinamic.
Codul l-am facut ieri, dupa ce am vazut postul, si nu l-am testat complet.
Am fost curios sa vad ce timp se obtine cu o comanda sql, pentru ca, altfel, nu folosesc asa ceva pentru ca nu s-a ivit necesitatea. Cheile primare si candidate isi fac treaba singure, si nu apar astfel de situatii.

Daca vrei sa folosesti acest cod, tine cont de faptul ca elimina campurile memo si blob - astfel de campuri nu sunt admise in group by. Daca vrei sa compari si astfel de campuri, se poate lucra cu checksum - Sys(2007,camp) - in loc de camp.

Dar, daca ceea ce iti trebuie e o tabela curata, fara duplicate, cel mai simplu mode de a obtine este acesta:

Select Distinct * from tabela into cursor crsunic
*sau, daca nu te intereseaza toate campurile:
*select distinct camp1, camp2, camp3 ... campN from tabela into cursor crsunic

Use tabela in (select('tabela')) exclusive
Select tabela
Zap
Append from dbf('crsunic')
use



Daniel Buduru
 7/25/2012 3:35:38 PM
User is offlineEMRE.RESIT
194 posts
5th


Re: Găsitor de duplicate :)
 (Romania)
 Daniel Buduru wrote
Incearca si varianta asta, si spune-ne ce timp obtii pe tabela pe care ai facut testele.
In testele mele, pe o tabela de 380.000 inregistrari, cu 20 campuri, dureaza cam 9 sec.

LPARAMETERS tctable, tcdest, tcignorefields
LOCAL lctable
IF EMPTY(tctable)
    lctable=GETFILE('dbf')
    IF EMPTY(lctable)
        MESSAGEBOX('Nu s-a selectat nici o sursa')
        RETURN
    ENDIF
ELSE
   lctable=tctable
ENDIF

SELECT * FROM (lctable) WHERE .f. INTO CURSOR crstemp

LOCAL lcsql, lcfields, lcfield, laf[1], lni, lcSetExact, lcignorefields
lcSetExact=SET("Exact")
lcsql=''      
lcfields=''
lcignorefields=','+CHRTRAN(EVL(tcignorefields,''),["' ],'')
SET EXACT on
AFIELDS(laf,'crstemp')
FOR lni=1 TO ALEN(laf,1)
  lcfield= laf[lni,1]
  IF ATC(','+lcField+',',lcignorefields)>0 OR INLIST(laf(lni,2),'M','W')
     LOOP
  ENDIF
  lcFields=lcFields+IIF(EMPTY(lcfields),'',', ')+lcfield
NEXT
IF lcSetExact='OFF'
    SET EXACT OFF
endif
lcSql='Select COUNT(*) as cnt, '+lcFields+' From '+lcTable+' group by '+lcfields+' having cnt>1 into cursor crsdup'
&lcSql

BROWSE last


si daca vreau sa verific daca numai anumite campuri sunt duplicate ???
 7/25/2012 3:56:58 PM
User is offlineDaniel Buduru
3508 posts
1st




Re: Găsitor de duplicate :)
 (Romania) Modified By Daniel Buduru  on 7/25/2012 2:57:24 PM)
Ai incercat sa treci toate celelalte campuri in lista de campuri ignorate?
Daniel Buduru
 7/25/2012 4:10:48 PM
User is offlineEMRE.RESIT
194 posts
5th


Re: Găsitor de duplicate :)
 (Romania)
unde trb mai exact? ori la tcignorefields sau lcignorefields?? scuzama nu prea ma pricep
 7/25/2012 4:57:34 PM
User is offlineDaniel Buduru
3508 posts
1st




Re: Găsitor de duplicate :)
 (Romania)
Mai direct:

Select count(*) as dup, camp1, camp2, campn from tabela group by camp1, camp2, campn



Daniel Buduru
 8/21/2012 10:15:58 AM
User is offlineEMRE.RESIT
194 posts
5th


Re: Găsitor de duplicate :)
 (Romania)
dar in codul de mai sus unde se pun acele ignorefields. se face vreo lista la tcignorefields?
 8/21/2012 10:37:59 AM
User is offlineDaniel Buduru
3508 posts
1st




Re: Găsitor de duplicate :)
 (Romania)
 EMRE.RESIT wrote
dar in codul de mai sus unde se pun acele ignorefields. se face vreo lista la tcignorefields?

Nu ai nevoie sa folosesti procedura pe care am postat-o in acest thread pentru a gasi cateva campuri duplicate.
Tot ce-ti trebuie este comanda din postul anterior:

Select Count(*) as cnt, camp_1, camp_2, camp_3 From tabela Group By camp_1, camp_2, camp_3 Having cnt>1 Into Cursor crsDup.

Pui in select, dupa count, campurile pe care vrei sa la verifici - numai acela campuri, nu si altele-, aceleasi campuri le pui si in Group By. 
Clauza Having cnt>1 filtreaza rezultatul, eliminand inregistrarile neduplicate.
Rezultatul este returnat in crsDup.

Daniel Buduru
 8/21/2012 10:45:48 AM
User is offlineEMRE.RESIT
194 posts
5th


Re: Găsitor de duplicate :)
 (Romania) Modified By EMRE.RESIT  on 8/21/2012 9:49:09 AM)
acel crsdup imi da inregistrarile care sunt duble? daca da dupa ce le vad unul din doua trebuie sal sterg, dar trebuie sal vad ca sa stiu pe care sal sterg. aa si trebuie sa vad acel camp care nu trebuie verificat ce valoare are ca sa imi dau seama pe care sa-l sterg.
 8/21/2012 12:40:38 PM
User is offlineDaniel Buduru
3508 posts
1st




Re: Găsitor de duplicate :)
 (Romania) Modified By Daniel Buduru  on 8/21/2012 11:41:50 AM)
Threadul asta are titlul "Gasitor de duplicate". Daca iti trebuie o procedura care sa-ti elimine duplicatele dintr-o tabela, pune intreabarea asa, in alt thread, poate cineva are ceva gata facut.
Daca ai un pune index candidate avand in cheie campurile pe care nu vrei sa le ai duplicate, si ai pune si cheie primara in tabela, nu te-ai intalni niciodata cu inregistrari multiple. Cel mai ieftin e sa previi, nu sa repari.

Daniel Buduru
 8/21/2012 3:22:33 PM
User is offlineDaniel Buduru
3508 posts
1st




Re: Găsitor de duplicate :)
 (Romania) Modified By Daniel Buduru  on 8/21/2012 2:31:56 PM)
Ai aici un algoritm care face cam ce vrei tu. Trebuie insa sa-l adaptezi la tabela ta, sau sa il parametrizezi si sa-l faci o procedura universala:
Daca nu ai cheie primara in tabela, modifici tabela si o adaugi.

CREATE CURSOR c1 (pk i autoinc, f1 i, f2 i, f3 i, f4 i)
FOR lni=1 TO 10000
 INSERT INTO c1 (f1, f2, f3, f4) VALUES (RAND()*100, RAND()*10, RAND()*20, RAND()*40)
next

SELECT .f. as selected, cd.grup, cd.cnt, ss.* ;
FROM c1 ss ;
INNER JOIN ;
(select RECNO() as grup, * from ;
(SELECT COUNT(*) as cnt, f1, f2, f3 FROM c1 GROUP BY f1, f2, f3 HAVING cnt>1) sd) cd ;
ON ss.f1=cd.f1 AND ss.f2=cd.f2 AND ss.f3=cd.f3 ;
ORDER BY grup, pk ;
INTO CURSOR crsDuplicate readwrite

BROWSE LAST nowait

*!* Urmatorul cod pastreaza doar prima pozitie din duplicate
GO top
lnGrup=.null.
SCAN all
 lngrup=grup
 SKIP
 Replace REST selected WITH .t. WHILE grup=lnGrup
 SKIP -1
endscan

SELECT * FROM c1 WHERE pk NOT in (SELECT pk FROM crsDuplicate WHERE selected) INTO CURSOR c1_faraduplicate
BROWSE LAST nowait



Daniel Buduru
  Visual FoxPro  Baze de date, tabele, view-uri si indecsi  Găsitor de dupl...

Search  Forum Home         

 Google Ads Minimize

    

Copyright 2002-2013 Profox   Terms Of Use  Privacy Statement